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ABSTRACT 

An  on-line,    general-purpose,    fact-retrieval  system  is  presented 
which  employs  a  classificatory  data  structuring  technique.     The 
technique  embraces  the  basic  concept  of  hierarchical  classification 
of  data  and  provides  users  with  multiple  avenues  of  access  to  a  data 
file.     Additionally,    the  data  file  may  be  partitioned  into  unrelated 
data  sets. 
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I.       INTRODUCTION 

The  term  "information  retrieval"  and  the  initials   "IR"  were 
coined  by  the  editors  of  Fortune  about  ten  years  ago.     However, 
Vannevar  Bush  first  formally  declared  the  necessity  for  an  informa- 
tion retrieval  discipline  in  his   "As  We  May  Think"  article  which  was 
written  for  Atlantic  Monthly  in  1946.      The  United  States  Government 
and  those  people  involved  in  Library  Science  were  truly  the  first  in- 
novators of  this  discipline  in  the  mid-fifties.     The  technological  ex- 
plosion being  felt  at  that  time  prompted  government  agencies  and 
library  scientists  to  search  for  more  efficient  systems  for  indexing, 
storing,    and  retrieving  documents.     Primary  concern  was  the  assur- 
ance that  vital  technical  information  would  be  available  to  all  possible 
users.     The  discipline  of  information  retrieval  as  we  know  it  today 
emerged  as  a  result  of  this  technological  explosion. 

Information  retrieval  has  been  defined  in  numerous  ways.     How- 
ever,   all  definitions  share  a  common  point  which  is  best  stated  by 
Taube  [Ref.    l]  as:     "The  right  information  made  available  to  the  right 
person  at  the  right  time.  "    Bourne  [Ref.    2]  states  that  "Information 
retrieval  has  become  a  generic  term,    firmly  established  through 
common  usage,    which  includes  reference,   fact,    and  document  retrie- 
val. "    Bourne  also  differentiates  between  data  processing  and  informa 
tion  retrieval.     The  former  includes  the  manipulation,    replacement, 
alteration,    or  addition  to  the  data  on  file  while  the  later  is  concerned 


with  the  storage  of  data  in  unaltered  form  for  later  re-use.     Use  of 

the  term  "information  retrieval"  in  this  paper  implies  the  generic 

i 

meaning  stated  by  Bourne. 

This  paper  is  devoted  to  the  investigation  of  a  data  structuring 
concept  proposed  by  Kildall  [Ref.    3]  for  use  in  a  general-purpose 
fact-retrieval  system.     Before  investigating  Kildall's  proposal  in 
section  VI,    the  techniques  of  indexing,    storage,    and  retrieval  estab- 
lished for  Library  Science  purposes  will  be  reviewed.       These  basic 
techniques  form  a  foundation  for  the  design  of  specific  IR  systems. 

Information  retrieval  is  divided  into  three  major  operatives: 

1.  Indexing   (classification,    description,    and  structuring  of 
information  sources). 

2.  Storage  (organization  and  storage  of  files). 

3.  Retrieval  (searching  and  displaying  information). 
Figure   1  is  a  simplified  diagram  which  illustrates  a  typical 

information  retrieval  process.     An  index  is  constructed  which  de- 
scribes the  information  source   (document  or  record)  and  is  stored  in 
a  file  along  with  the  source  itself.     A  request  for  information  (query) 
is  directed  to  the  index  file  where  the  location  of  the  requested  docu- 
ment within  the  information  file  is  found.     A  search  of  the  information 
file  then  results  in  the  retrieval  of  the  document.     This  process  is 
analogous  to  the  indexing  and  storing  of  new  books  received  in  a 
library,    and  the  search  for  information  by  a  library  patron. 


10 


INFORMATION 

SOURCE 

u 

INDEXING 
PROCESS 

Index  Re< 

:ord 

_J 

L     ..  .. 

'L_ 

INDEX 
FILE 

Search 

INFORMATION 
FILE 

i 

, 

Storage 

Query 

iicrn 

Information 

lib 

LR 

Figure  1 
Basic  Flow  Diagram  of  the  Information  Retrieval  Process 
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II.       INDEXING 

Indexing  is  the  classification,    description,    and  structuring  of 
information  in  such  a  manner  that  retrieval  of  the  information  is 
accomplished  expeditiously.     This  task  is  performed  on  information 
sources  such  as  books,    documents,    and  files  and  is  an  integral  part 
of  the  information  retrieval  process.     Since  retrieval  is  the  counter- 
part of  indexing,    the  indexing  and  retrieval  schemes  used  in  an  IR 
system  must  be  compatible  in  order  for  a  user  to  communicate  with 
the  system.     Clearly,    retrieval  efficiency  (i.e.,    ease  and  speed  of 
retrieving  desired  information  with  a  minimum  of  false  drops    )  is 
related  to  the  efficiency  and  consistency  of  the  indexing  process. 

As  a  rule,    the  information  base  of  an  IR  system  is  specialized 
and  as  such  requires  a  professional  jargon.     Ideally,    the  indexer  and 
system  user  are  experts  in  this  professional  language.     However,    this 
may  not  necessarily  be  true  and  causes  a  problem  commonly  confronted 
by  IR  system  designers.     The  problem  is  how  to  structure  specialized 
data  for  input  to  the  system  in  a  manner  that  is  convenient  to  both  the 
indexer  and  user  while  maintaining  data  accessibility.     An  example  of 
an  indexing  language  is  the  Dewey  Decimal  System  used  for  indexing 
library  books. 

Selection  of  an  indexing  language  is  based  upon  the  following 
considerations : 


Output  of  irrelevant  information  as  a  result  of  a  retrieval 
request  is  called  a  "false  drop.  " 


12 


1.     The  language  should  be  convenient  to  use,    such  as  natural 
language  or  a  language  that  could  be  easily  learned. 

I       2.     Computerized  systems  require  that  the  language  be  rigid 
enough  to  be  usable  in  the  machine  but  must  also  remain  convenient 
for  human  utility. 

3.  The  vocabulary  should  be  broad  enough  to  allow  accurate 
description  of  the  information. 

4.  The  language  should  be  flexible  enough  to  allow  modification 
as  changes  in  information  occur. 

There  are  numerous  indexing  languages  in  use  today  each 
tailored  to  suit  specific  usage  of  the  IR  system.     Therefore,    indexing 
languages  normally  reflect  the  viewpoint  of  the  system  designer  in 
his  attempt  to  organize  the  system's  data  base  to  best  suit  the  needs 
of  the  user.     Several  indexing  techniques  which  evolved  from  Library 
Science  will  be  reviewed  in  the  sections  that  follow.     These  techniques 
appear  to  form  the  nucleus  from  which  specialized  systems  are  formed. 
Although  the  techniques  are  primarily  oriented  toward  document  in- 
dexing,   variations  are  used  in  all  types  of  IR  systems.     The  techniques 
are  presented  in  ascending  order  of: 

1.  Effort  on  the  part  of  the  indexer. 

2.  Difficulty  in  automating. 

3.  Indexing  power 

4.  Retrieval  efficiency. 

A.         UNIT -TERM  INDEXING 

The  simplest  indexing  technique  involves  the  extraction  of 
descriptive  words  from  the  information  source.     The  source  is  then 
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associated  with  each  of  the  terms  used  to  describe  its  content.     In 
the  case  of  a  library  book,    or  other  document,    descriptive  words 
may  be  taken  from  the  title,    abstract,    or  the  text  itself.     This  tech- 
nique requires  a  minimum  of  effort  (other  than  reading  the  source) 
on  the  part  of  the  indexer.     In  addition,    the  indexing  is  accomplished 
rather  quickly  since  the  indexer  need  not  be  ultimately  familiar  with 
the  subject  material.     Unit-term  indexing  is  particularly  advantageous 
when  no  information  is  available  on  the  spread  of  subject  material. 
The  addition  of  new  material  to  the  data  base  is  easily  accomplished 
by  expanding  the  vocabulary  (unit-terms)  to  include  new  descriptive 
words.     However,    unit-term  indexing  lacks  rules  for  combining  terms 
into  units  which  have  meaning.     This  shortcoming  causes  indexing 
problems  when  synonyms,    plural  word  forms,    and  generically  related 
terms  are  encountered  in  the  source  document. 

The  search  device  used  in  such  a  system  is  an  alphabetical 
listing  (indexing  record)  of  the  key  words  used  by  the  indexer.     In 
general,    the  information  source    is       listed  with  each  key  word  and 
is  used  as  a  source  descriptor,    or  the  listing  may  indicate  the  location 
of  the  source,    or  both.     It  is  possible  that  the  user  will  have  difficulty 
in  using  this  system  unless  he  knows  precisely  the  topic  that  he  is 
searching  for.     An  analogy  may  be  drawn  to  searching  the  telephone 
book  for  a  name  when  the   spelling  of  the  name  is  not  known.     There- 
fore,   this  indexing  scheme  is  often  utilized  in  IR  systems  where  the 
user  is  familiar  with  the  professional  jargon  contained  in  the 
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information  sources  (e.g.  ,    technical  libraries). 

2 
An  excellent  example  of  this  subject-indexing     technique  is  the 

Uniterm  Coordinate  Indexing  System  which  dates  back  to  1952.     The 

Uniterm  ("unit-term")  System  includes  fifteen  rules  governing  the 

indexer's  operation,    rules  for  determining  key  words,    methods  for 

processing  word  meanings,    and  cross-referencing  techniques.     Some 

agencies  using  this  system  have  drafted  standard  unit-terms   (key 

words)  to  be  used  by  indexers.     However,    this  is  unnecessary  for  an 

unstructured  language  since  new  unit-terms  may  be  added  without 

perturbing  the  existing  system.     An  example  of  an  index  that  might 

be  constructed  from  a  Uniterm  System  is  shown  below.     The  numbers 

below  the  unit-terms  might  represent  reference  serial  numbers,    or 

library  call  numbers. 

ABLATION 

452    573      772 

ADSORPTION 

137    459    823     1201 

ADHESIVE 

491 

AERODYNAMIC 

139    241     242    357    552     1010     1168 


2 

"Subject  indexing,  "  "keyword  indexing,  "  and  "coordinate 

indexing"  are  terms  commonly  used  to  describe  the  technique  presented 
here. 
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B.         KEY-WORD-IN-CONTEXT  INDEXING 


Another  very  common  subject  indexing  technique  is  called  "Key- 


! 


3 
Word-In-Context"  (KWIC)  indexing    .      The  indexing  power  of  KWIC 

is  very  slightly  greater  than  the  simplest  of  subject  indexing  techniques 
since  the  key  word  is  shown  in  the  context  of  the  entire  subject.     There 
are  several  variations  in  KWIC  format  but  essentially  it  is  an  alphabet- 
ical listing  of  key  words.     Whole  phrases  are  extracted  from  the 
source  so  that  a  user  can  easily  determine  the  role  of  the  key  word. 
The  distinguishing  feature  of  KWIC  is  its  displa-    format  shown  in  the 
example  below.     Let  us  suppose  that  the  title  of  a  source  document 
is:     "Principles  of  Automated  Inf.    mation  Retrieval.  "    Assuming  that 
the  indexer  selects  four  key  words  to  describe  the  source,    the  KWIC 
index  would  appear  as: 

"5135  Principles  of  AUTOMATED  Information  Retrieval 
iples  of  Automated  INFORMATION  Retrieval  5135  Princ 
ion  Retrieval  5135  PRINCIPLES  of  Automated  Informat 
omated  Information  RETRIEVAL  5135  Principles  of  Aut" 
Note  that  "automated",    "information",    "principles", and  "retrieval" 
are  individual  key  words.     A  user  desiring  this  source  document 
could  find  it  by  using  any  one  of  the  four  key  words.     Note  also  that  a 
user  may  find  this  system  easier  to  use  than  the  Uniterm  System  if  he 
is  unfamiliar  with  the  subject  material. 


Also  referred  to  as  "permuted"  or  "permuted  title"  indexing. 


16 


C.         THESAURUS 

Indexing  power  may  be  increased  further  by  determining  generic 
relationships  between  key  words.     The  Armed  Forces  Information 
Agency  (ASTIA)  and  the    Defense  Documentation  Center   (DDC)  have 
produced  thesauri  which  are  alphabetical  lists  of  indexing  terms  with 
related  terms  and  "see"  references.      These  lists  are  used  by  indexers 
as  means  of  standardizing  their  operation.     In  other  words,    indexers 
describe  similar  information  sources  in  consistent  fashion.     These 
thesauri  define  some  hierarchy  in  key  words  and  are  useful  to  the 
user  as  well  as  indexer  since  they  allow  the  user  to  formulate  queries 
with  the  exact  terms  used  by  the  indexer.     An  example  of  a  thesaurus 
borrowed  from  Meadow  [Ref.    4]  is  exhibited  below. 
COMPUTERS 

(Computers  and  Data  Systems) 
Includes: 

Calculating  machines 
Generic  to: 
ANALOG  COMPUTERS 
ANALOG-DIGITAL  COMPUTERS 
BOMBING  COMPUTERS 


Also  see: 
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DATA  PROCESSING  SYSTEMS 


SIMULATION 
Computing  gun  sights  use  GUN  SIGHTS 

D.         HIERARCHICAL  CLASSIFICATION 

Probably  the  most  widely  used  indexing  technique  is  that  of 
hierarchical  classification  where  a  universe  of  information  is  repeated- 
ly divided  and  sub-divided  into  a  classificatory  tree.     This  index 
language  has  a  very  tightly  controlled  but  simple  vocabulary  contained 
in  an  authority  list  of  key  words  provided  with  the  classification  system. 
Each  key  word  in  the  authority  list  is  assigned  a  numeric  or  alphanumeric 
code   (mnemonic  codes  could  be  used  but  normally  are  not).     As  can  be 
seen  in  the  tree  structure  exhibited  below,    a  key  word  contains  all 
those  key  words  generic  to  it  (i.  e.  ,    above  it  in  the  branch  of  the  tree 
from  which  it  was  derived).     Hierarchical  schemes  allow  the  indexer 
to  describe  an  information  source  in  generic  levels  so  that  the  user 
may  formulate  his  query  in  more  general  or  more  specific  terms  by 
moving  up  or  down  the  classification  tree. 

Modification  of  key  word  meaning  is  difficult  to  accomplish 

since  changing  one  word  in  the  tree  affects  all  key  words  generic  to 

it.     However,    changes  at  the  bottom  of  the  tree  are  easily  made  since 
no  perturbation  of  the  tree  occurs.     Expansion  of  the  vocabulary  used 
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in  this  sytem  is  readily  accomplished  by  expanding  the  tree  horizon- 
tally. 

The  most  well  known  hierarchical  systems  are  the  Dewey 
Decimal  Classification  System  (exhibited  below),  the  Library  of 
Congress  System,    and  the  Universal  Decimal  Classification  System. 


500 

510 

519 

519.9 

519.92 


Pure  Science 


Mathematics 


Probabilities  and  Statistical  Mathematics 


Treatment  of  Data 


Programming  (linear  and  dynamic) 


E.         FACETED  INDEXING 

In  the  immediately  preceding  section  a  classification  technique 

was  presented  which  structures  a  topic  (universe  of  information)  by 

dividing  and  subdividing  it  to  form  a  classificatory  tree.     Faceted 
indexing  deals  with  individual  key  words  taken  from  the  data  source 
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and  grouped  into  categories  with  respect  to  their  usage  within  the 
source.     Terms  within  each  group  are  structured  into  a  classificatory 
tree.     A  term  extracted  from  the  source  is  analyzed  from  several 
points  of  view  and  a  group  of  indexing  terms  are  synthesized  to  de- 
scribe the  key  word  in  context.     This  technique  is  referred  to  as 
"facet  analysis,  "  "faceted  indexing,  "  and  "relational  indexing"  where 
each  key  word's  point-of-view-analysis  is  called  a  facet. 

An  excellent  example  of  faceted  indexing  is  given  by  Meadow 
[Ref.    4J.     Let  us  suppose  that  "steel"  is  a  key  word  taken  from  a 
source  document.     The  document  contains  information  relating  to  the 
manufacture,    use,    chemical  analysis,    and  properties  of  steel.     By 
appending  descriptors  to  the  key  word  "steel"  the  following  index 
terms  are  created: 

STEEL,    manufacture  of 

STEEL,    use  in  automobiles 


These  index  terms  are  not  predefined  in  any  authority  list  but 
are  constructed  by  the  indexer  by  appending  descriptors  to  the  key 
word.     The  terms  follow  some  syntactic  rule  such  as:     subject  followed 
by  modifier,    followed  by  operation  modifier.     The  utility  of  this 
technique  is  that  the  indexer,    armed  with  a  descriptor  list  and  syn- 
tactic rules  tailored  to  suit  the  particular  IR  system,    may  analyze 


20 


a  source  from  many  points  of  view  and  construct  index  terms  that 
describe  the  information  content  in  great  detail. 

F.         AUTOMATIC  INDEXING 

In  the  foregoing  discussions,    it  was  assumed  that  the  indexer 
was  human.     A  treatment  of  automatic  (computer)  indexing  is  now  in 
order. 

Automatic  indexing  is  difficult  to  accomplish  for  two  main  rea- 
sons.    First,    the  information  source  must  be  in  machine  readable 
form.     In  the  case  of  books  or  other  lengthy    documents  this  is  a  very 
expensive  requirement.     However,    development  of  character  recog- 
nition devices  and  the  production  of  transcripts  in  machine  code  as  a 
by-product  of  automatic  typesetting  have  eased  the  cost  of  this  require- 
ment.    The  second  problem,    and  the  more  serious,    is  the  development 
of  algorithms  or  heuristics  which  derive  meaning  from  strings  of 
characters.     This  is  an  area  of  Artificial  Intelligence  in  which  a  good 
deal  of  research  has  been  expended.     However,    the  results  of  this 
research  have  been  empirical  since  we  lack  sophisticated  linguistic 
and  semantic  knowledge.      References  5,    6,    7,    and  8  contain  excellent 
treatments  of  the  research  conducted  and  problems  involved  in  machine 
translation  of  natural  language  while  ref.    9  contains  a  comparison  of 
manual  and  automatic  indexing  techniques. 

There  is  an  automatic  indexing  technique  in  commercial  use 
today;  however,    it  is  a  "brute  force"  adaptation  of  KWIC.      Basically, 
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the  technique  produces  index  key  words  by  comparing  words  from  the 

source  to  words  stored  in  an  authority  list.      There  are  many  limita- 

l 

tions  to  this  system  such  as  correct  handling  of  hyphenated  words, 
plural  forms,    and  proper  nouns  but  the  primary  limitation  is  that  the 
list  must  contain  a  sufficient  number  of  appropriate  words  in  order 
for  a  source  to  be  adequately  indexed.     The  size,    speed,    and  complex- 
ity of  such  a  system  should  be  obvious. 

Referring  to  figure   1  it  is  seen  that  the  indexing  process  produces 
index  records.     The  contents  of  the  records  vary  widely  and  are  de- 
pendent upon  the  type  of  IR  system  (e.  g.  ,    document,    fact,    or  reference), 
In  addition  to  subject  descriptors,    the  index  may  contain  the  location 
of  the  information,    source,    author,    reference  to  another  index  record, 
or  other  information  deemed  pertinent  by  the  system  designer.     It  will 
also  be  noted  from  the  figure  that  the  information  source,    or  informa- 
tion concerning  the  source,   will  also  be  stored  in  the  IR  system.     In 
the  case  of  a  large  document  such  as  a  book,    it  probably  will  not  be 
stored  in  the  computer  but  rather  a  reference  or  abstract  will  be 
stored  as  a  substitute.     In  some  cases,    the  index  record  itself  will 
contain  all  of  the  information  associated  with  an  information  source. 
For  example,    an  index  record  for  a  library  book  may  contain  the 
book's  location  within  the  library,    therefore,    the  system  will  present 
the  index  record  itself  in  answer  to  a  user's  query. 
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III.       STORAGE 

This  section  of  the  paper  contains  descriptions  of  various 
techniques  used  for  organizing  index  and  information  files  within  an 
IR  System's  storage  media.  There  will  be  no  discussion  of  storage 
devices  since  it  is  assumed  that  the  reader  is  already  familiar  with 
computer  equipment.  The  reader  is  aware,  of  course,  that  the 
system's  capacity,  cost,  and  response  time  are  greatly  affected  by 
the  selection  of  various  storage  media. 

A.         FILE  ORGANIZATION 

Organization  of  an  index  file  or  information  file  specifies  the 
positioning  of  the  records  in  relation  to  one  another  within  the  file 
along  with  the  physical  position  of  the  file  within  the  storage  media. 
Choice  of  a  rule  which  governs  file  organization  is  dependent  upon 
desired  response  time,    peak  retrieval  loads,    system  reliability    , 
category  of  users,    cost,    rate  of  information  change,    rate  of  system 
growth,    and  type  of  storage  media.     There  are  several  rules  for  file 
organization  which  are  extensively  used  in  IR  systems  and  they  are 
presented  here.     These  rules  are  equally  applicable  to  index  and  in- 
formation files. 

1 .     Sequential  Organization 

The  first  method  involves  the  sequential  placement  of 

st 
records  within  a  file.     The  (i+  1 )       record  follows   (physically  and/or 


5 
Ability  to  retrieve  a  maximum  of  information  with  a  minimum  of 

false  drops. 
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logically)  the  i"1  record.     For  example,    the  alphabetical  listing  of 
subject-indexing  key  words,    alphabetical  arrangement  of  employee 
records,    etc.     This  method  is  very  conservative  of  memory  space 
since  there  is  no  need  to  supply  pointers  or  links  to  indicate  where 
the  next  record  in  the  file  is  located.     On  the  other  hand,    additions  or 
deletions  to  the  file  are  difficult  to  make.     Let  us  suppose  that  we 
desire  to  add  a  new  name  to  the  telephone  book.     Then  all  of  the  names 
which  follow  the  inserted  name  must  be  moved.     Likewise,    the  deletion 
of  a  name  results  in  perturbation  of  the  list.     This  type  of  organization 
is  most  commonly  used  with  magnetic  tape  where  records  are  searched 
sequentially. 

2.      Chaining 

Another  technique  of  file  organization  is  called  "chaining" 
where  addresses   (links,    chains,    or  pointers)  are  stored  in  one  or 
more  fields  of  a  record  to  indicate  the  location  of  the  next  record 
within  the  file.     Recall  from  the  discussion  of  indexing  that  thesauri 
contain  "see"  references.     These  references  are  links  which  convey 
the  idea  of  chaining.     Chaining  is  a  particularly  effective  method  when 
used  in  a  crowded  memory  since  "referred  to"  records  may  be  placed 
in  any  available  space  within  the  memory  (unlike  the  rigid  sequential 
scheme).     Also,    the  utility  of  chaining  is  fully  realized  in  a  system 
which  experiences  a  high  rate  of  information  change.     This  method 
requires  more  memory  space  than  the  sequential  scheme  since  extra 
fields  must  be  appended  to  the  records  to  accommodate  the  links. 
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a.     Branching 

An  extension  of  the  chaining  technique  is  referred  to  as  a 

! 

"branching  structure.  "    Branching  is  used  to  achieve  versatility  in 
changing  record  entries,    changing  file  structures,    and  conversion, 
where  possible,    of  variable-length  records  to  fixed-length  records. 
A  trivial  example  is  shown  in  Figure  2.    which  exhibits  the  idea  of 
branching  file  structures. 

Let  us  suppose  that  our  file  consists  of  all  military  flying  clubs 
in  the  United  States.     Each  record  consists  of  the  club's  name,    address 
(airport,    city,    state),    membership,    and  type  of  aircraft.     Obviously, 
these  records  are  variable -length  because  the  number  of  aircraft 
owned  by  each  club  is  variable.     The  main  file  may  be  converted  to 
fixed-length  records  by  replacing  the  aircraft  type  fields  with  a  single 
address.     The  aircraft  types  could  then  be  included  in  another  fixed- 
length  file.      The  address  in  the  main  record  links  to  an  address  file 
which  in  turn  points  to  the  file  containing  the  aircraft  types.     Repetition 
of  aircraft  type  is  eliminated  from  the  main  records,    main  records 
are  fixed-length,    and  changes  are  made  only  to  the  address  file  not 
the  main  file  or  aircraft  file. 

Figure  3  exhibits  another  feature  of  this  technique  which  replaces 

all  field  entries  in  the  main  file   (except  the  name)  with  addresses.     If 

it  is  later  decided  to  add  "county"  to  "city"  and  "state"  then  no  changes 

are  required  in  the  main  file  but  a  field  must  be  added  to  each  of  the 
"city-state"  file  records  to  absorb  the  new  addition. 
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Figure  2 

Conversion  of  Variable-Length  Records  to  Fixed-Length  Records 
using  the  Branching  Technique. 
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Figure  3 
Addition  of  Records  to  an  Existing  Branching  Structure 


27 


3.     List  Structuring 

Although  chaining  and  branching  allow  records  to  be  scattered 

I 

throughout  memory,    their  membership  in  a  particular  file  is  main- 
tained by  some  order  of  relative  placement  (e.  g.  ,    employee  records 
logically  linked  in  alphabetical  order  but  physically  scattered  through- 
out the  file).     List  structuring  does  not  require  that  records  be  ordered 
in  any  specific  manner  within  a  file.     Further,    the  fields  of  a  record 
may  be  physically  separated  and  then  linked  to  form  a  logical  record. 
The  advantage  of  this  form  of  storage  is  the  freedom  of  changing  field 
content  structure,    record  content,    and  file  structure.     However,    this 
method  requires  a  great  deal  more  memory  space  than  any  other 
technique.     In  addition,    the  retrieval  process  is  relatively  slow  since 
more  time  is   required  to  gather  the  elements  of  a  record  together. 

The  three  techniques  of  file  organization  described  above  are  all 
forms  of  list  structuring  and  each  demonstrates  a  different  degree  of 
structural  freedom.      Chaining  requires  that  fields  remain  contiguous, 
but  records,    while  remaining  ordered,    may  be  physically  separated. 
Branching  is  an  extension  of  chaining  allowing  fields  to  contain  address 
linkages  to  other  fields.     The  last  method  allows  any  ordering  and 
structuring  of  fields  and  records. 

B.         FILE  SEQUENCING 

It  is  important  that  records  be  sequenced  (sorted)  in  some 
manner  for  use  in  IR  systems.     Sequencing  is  normally  based  on 
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some  particular  attribute  of  a  record  (called  a  sort  key)  such  as  the 
"name"  field  of  an  employee  record.      Selection  of  the  sort  key  is 
based  on  many  considerations  but  the  objective  is  to  select  the  same 
sort  key  as  may  be  used  in  a  retrieval  request.     Subordinate  sort  keys 
may  also  be  chosen  when  more  than  one  record  has  the  same  primary 
sort  key  value  (e.g.  ,    several  employees  with  the  same  last  name). 
Searching  records  which  are  ordered  on  the  primary  sort  key  is  then 
called  an  "ordered  search.  " 
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IV.       RETRIEVAL 

The  retrieval  process  essentially  consists  of  searching  the  index 
files  and  information  files  for  information  which  satisfies  a  user's 
query.     If  the  information  is  found,    it  is   sent  to  the  user,    if  not,    the 
user  is  so  informed.     It  should  be  noted  that  "searching"  and  "retrieval" 
are  not  synonymous.      "Searching"  is  a  file  access  operation  used  to 
locate  records  for  matching  against  the  query,    while  "retrieval"  is 
the  actual  output  of  information  which  satisfies  the  query.     However, 
use  of  the  word  "retrieval"  here  will  imply  the  entire  operation  of 
searching  and  retrieval. 

As  previously  discussed  in  section  II,    indexing  and  retrieval  are 
counterparts  since  indexing  refers  to  the  structure  of  information  for 
input  to  the  files,    while  retrieval  is  the  process  of  locating  and  dis- 
playing desired  information.     Therefore,    the  query  language  employed 
by  the  system  user  must  be  compatible  with  the  index  language  em- 
ployed by  the  system    'esigner.     It  is  important  that  the  query  and 
index  languages  use  the  same  vocabulary  in  order  for  the  IR  system 
to  understand  the  user's  requests.     The  user  must  also  be  familiar 
with  the  system's  logic  in  order  to  formulate  an  intelligent  query.     He 
must  know  if  the  system  honors  the  use  of  Boolean  relationships 
("and,  "  "or,  "  "not")  and  magnitude  comparators   ("greater  than,  " 
"less  than,  "  etc.  )    as  query  terms. 
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Once  the  query  is  formulated  it  is  input  to  the  system's  index 
file.     A  matching  process  takes  place  at  the  index  file  where  the  terms 
used  in  the  query  are  matched  against  the  index  file  records.     Index 
records  which  match  the  terms  of  the  query  are  employed  as  locators 
to  direct  the  retrieval  of  data  from  the  information  file. 

The  technique  used  in  searching  the  index  and  information  files 
is  governed  by  the  file  organization  (structure,    sequencing,    content, 
and  storage  medium).     In  the  ensuing  discussion  of  search  techniques 
it  should  be  borne  in  mind  that  whatever  technique  is  used  it  is  fixed 
within  the  IR  system.     Also,    the  interrelationship  between  search  plan 
and  file  organization  may  limit  file  accessibility  and  search  flexibility. 

A.         FULL-FILE  SEARCH 

One  search  plan  incorporates  a  full-file  search  where  every 
record  of  the  file  is  matched  (e.  g.  ,    the  value  of  the  query  term  is 
matched  against  the  value  of  the  sort  key).      This  plan  is  used  when 
the  order  of  records  within  a  file  is  unknown  (e.g.  ,    a  file  of  employee 
records  that  are  not  alphabetically  sorted).     In  this  case,    if  we  were 
searching  for  Doe's  record  and  found  Smith's  it  does  not  follow  that 
we  have  searched  too  far  since  the  records  are  not  collated.     In  ad- 
dition,   there  may  not  be  any  assurance  that  a  single  match  satisfies 
the  search  (more  than  one  Doe  in  the  file).      Therefore,    all  records 
within  a  file  must  be  searched. 


31 


B.  SEQUENTIAL  SEARCH 

A  sequential  search  plan  might  be  used  when  the  records  are  not 
only  sequenced  but  sequenced  on  the  same  term  as  is  used  in  the  query. 
Sequential  searches  are  normally  used  in  conjunction  with  sequential 
access  type  storage  devices.     The  records  of  a  file  are  matched  se- 
quentially until  a  successful  match  is  made  or  when  the  value  of  the 
query  term  exceeds  the  value  of  the  sort  key.     In  this  case,    searching 
for  Doe's  record  and  locating  Smith's  record  indicates  that  the  search 
has  not  only  gone  too  far  but  no  successful  retrieval  will  be  made 
since  there  is  no  Doe  in  the  file. 

C.  BINARY  SEARCH 

A  binary  search  plan  may  also  be  used  with  a  sequenced  file. 
The  term  "binary"  implies  that  a  two  valued  decision  is  made  after 
every  match  attempt.     The  search  begins  in  the  middle  of  the  file.     If 
the  first  match  attempt  is  unsuccessful  then  the  next  attempt  is  made 
one-quarter  file  length  away  from  the  first.     The  direction  of  the  sub- 
sequent search  is  dependent  upon  the  result  of  comparing  the  value  of 
the  query  term  and  the  sort  key  (e.  g.  ,    if  the  sort  key  is  greater  than 
the  query  term  then  move  one -quarter  file  toward  the  beginning  of  the 
file).     Each  successive  move  is  then  made  one-half  the  length  of  the 
preceding  move.     If  there  are  n  records  in  the  file  then  there  will  be 
approximately  log2n  moves  to  exhaust  the  file. 
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D.  DIRECT  ACCESS  SEARCHING 

The  last  file  searching  technique  relies  upon  a  special  type  of 
index  file  called  an  inverted  index.     This  is  probably  the  most  common 
type  of  index  file  used  in  IR  systems.     The  inverted  file  records  con- 
sist of  the  descriptors  produced  during  the  indexing  process.      The 
descriptors  are  used  as  sort  keys  for  sequencing  the  records  within 
the  index.     Appended  to  each  descriptor  field  are  fields  which  contain 
addresses  of  the  associated  records  in  the  information  file.     Some 
type  of  search  plan  is  conducted  (usually  binary)  for  matching  descrip- 
tors (which  are  sort  keys)  to  the  query  term.     When  a  successful 
match  is  achieved,    the  addresses  of  the  appropriate  information  re- 
cords are  obtained  and  the  records  are  directly  retrieved. 

E.  COMBINED  SEARCH  PLANS 

The  above  treatment  of  search  plans  demonstrates  that  the 
techniques  are  dependent  upon  file  organization  but  plans  may  be  com- 
bined in  one  IR  system.      For  example,    a  binary  search  may  be  em- 
ployed in  the  index  file  to  locate  the  disk  and/or  track  which  contains 
the  desired  information  while  a  sequential  search  is  made  of  the  track 
for  the  requested  records. 
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V.       RETRIEVAL  SYSTEMS 

i      This  section  of  the  paper  contains  a  discussion  of  the  primary- 
differences  between  reference,    document,    and  fact  retrieval  in  order 
to  provide  a  frame  of  reference  for  the  development  of  a  fact-retrieval 
system.      Reference  retrieval  is  treated  first  since  it  is  the  least 
complicated  of  the  three  types  of  information  retrieval. 

Queries  used  in  a  reference-retrieval  system  contain  only  the 
topic  for  which  information  is  desired  (e.  g.  ,    STEEL).      The  material 
provided  to  the  requestor  is  a  list  of  references  pertaining  to  his  topic. 

Document  retrieval  queries  are  narrower  in  scope  since  de- 
scriptive terms  are  used  to  modify  the  topic  (e.  g.  ,    STEEL,    chemical 
properties  of).     Documents  are  provided  to  the  requestor  which  contain 
the  desired  information. 

Fact-retrieval  systems  are  the  most  complicated  and  powerful 
of  all  since  they  are  capable  of  providing  specific  answers  to  specific 
questions. 

A.         REFERENCE  RETRIEVAL 

Reference  retrieval  is  the  first  step  taken  by  one  in  search  of 

specific  information.     As  explained  above,    a  reference -retrieval 

system  provides  a  user  with  a  bibliography  pertaining  to  the  topic  for 

which  specific  information  is  sought.     The  second  step  in  the  search 

for  information  is  totally  unrelated  to  the  reference-retrieval  system. 
The  user  must  examine  the  documents  listed  in  the  bibligraphy  in 
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order  to  obtain  the  desired  information.     It  is  clear  that  in  the  first 
step  the  user's  search  for  information  is  narrowed  from  a  search  of 
the  entire  "library"  to  a  "shelf"  in  the  library. 

B.  DOCUMENT  RETRIEVAL 

The  definition  of  document  retrieval  is  not  straight  forward. 
One  point-of-view  holds  document  retrieval  as  the  second  step  of 
reference  retrieval.     In  another  point-of-view,    it  is  a  special  case  of 
fact  retrieval.     What  this  author  regards  as  document  retrieval  may 
be  fact  retrieval  to  another.     The  definition  upheld  by  this  author  is 
the  retrieval  of  unprocessed  text  word-for-word  as  it  is  stored  in  the 
information  file.     An  example  would  be  requesting  a  specific  report 
from  a  technical  library. 

C.  FACT  RETRIEVAL 

Fact  retrieval  ranges  from  the  retrieval  of  processed  text 
stored  in  an  information  file  to  the  retrieval  of  specific  answers  to 
specific  questions.     The  more  powerful  end  of  the  spectrum  is  refer- 
red to  as  "question  answering".      Reference   10  contains  an  excellent 
treatment  of  the  general  characterizations,    limitations,    capabilities, 
and  feasibility  of  the  question-answering  type  of  fact-retrieval  systems, 
Reference   11  contains  a  practical  example  of  a  question-answering 
program. 

Confusion  arises  at  the  low  end  of  the  fact-retrieval  spectrum 
where  it  is  difficult  to  distinguish  the  difference  between  document 
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and  fact  retrieval.     One  point  should  help  clarify  the  difference. 
Document-retrieval  systems  possess  only  rote  memory  which  means 
that  their  capability  is  limited  to  the  display  of  information  word-for- 
word  as  it  is  stored  in  the  data  base.     Fact-retrieval  systems  possess 
the  capability  of  manipulating  data  stored  in  the  data  base  into  a  form 
which  best  satisfies  the  user's  request. 
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VI.     DATA  STRUCTURE  FOR  A  FACT-RETRIEVAL  SYSTEM 

I 

!     This  section  contains  the  description  of  a  data  structuring  tech- 
nique proposed  by  Kildall  [Ref.    3]  for  use  in  a  general-purpose  fact- 
retrieval  system.     Specific  useage  of  the  system  depends  in  part  upon 
the  type  of  information  stored  in  its  files.     However,    the  nature  of  the 
system  is  the  processing  of  data  to  provide  a  user  with  specific  answers 
to  his  queries.     Therefore,    the  system  approaches  "question  answering.  " 
The  data-structuring  technique  employs  the  basic  concept  of  hierarch- 
ical classification  which  divides  a  topic   (also  referred  to  as  a  universe 
of  discourse)  into  its  class  structure  and  correlates  the  data  elements 
of  the  information  file  to  a  tree-type  classificatory  structure. 

A  treatment  of  the  retrieval  process  is  also  provided  here  since 
the  query  format  is  directly  related  to  the  data-structuring  technique. 

This  section  is  expressly  devoted  to  a  discussion  of  the  data- 
structuring  concept  while  section  VII  contains  the  description  of  the 
general-purpose  fact-retrieval  system  which  employs  the  proposed 
technique.     The  system  was  designed  for  the  primary  purpose  of  in- 
vestigating the  potential  of  the  data-structure  concept  and  not  for 
production  purposes. 

As  previously  discussed,    fact-retrieval  systems  range  from  the 
manipulation  of  processed  text  to  "question  answering."    The  system 
described  herein  maintains  a  position  in  the  middle  of  this  continuum. 
The  term  "general  purpose"  used  here  does  not  necessarily  mean  that 
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the  system  may  be  utilized  throughout  the  full  range  of  fact  retrieval. 
Rather,    it  means  that  the  system  will  accommodate  files  which  contain 
different  types  of  information. 

A.         DATA  STRUCTURE 

The  structure  employed  for  indexing  data  incorporates  the  con- 
cept of  hierarchical  classification  which  allows  the  user  to  enter  the 
data  base  in  a  number  of  ways  in  order  to  extract  desired  information. 
A  universe  of  discourse  is  structured  in  terms  of  "classes"  and  a 
hierarchy  of  classes  is  established  onto  which  the  associated  data 
elements  are  mapped.      For  example,    assume  that  a  universe  of  dis- 
course consists  of  personnel  records.     The  records  consist  of  names, 
addresses,    and  telephone  numbers  which  are  members  of  the  classes 
"NAME",    "ADDRESS,  "  and  "TELEPHONE  NUMBER.  "    "NAME"  is 
further  divided  into  the  subclasses   "LAST,  "  "FIRST,  "  and  "MIDDLE" 
while  "ADDRESS"  contains  "STREET,  "  "CITY,  "  and  "STATE.  " 
The  data  structure  is  then  represented  by  a  classificatory  tree  with 
the  data  elements  related  to  the  classes  contained  in  the  tree.     The 
data  element  "DOE,  "  for  example,    is  identified  as  a  member  of  the 
class   "LAST,  "  and  the  class   "LAST"  is  a  member  of  "PERSONNEL 
RECORD.  "    All  data  elements  of  a  structure  are  identified  in  this 
fashion. 

1.     Class  Structure  Representation 

Class  structures  are  represented  by  parenthesized  expres- 
sions which  are  used  to  define  the  structure  of  the  classificatory  tree. 
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The  technique  of  employing  parentheses  to  define  structures  is  similar 
to  that  technique  employed  in  LI  SP  S-expressions  [Ref.    12].     Punctua- 
tion symbols  used  in  the  expressions  are  the  left  parenthesis,    the 
right  parenthesis,    and  the  comma.     The  parentheses  are  used  to  en- 
close those  classes  which  are  directly  related  to  a  superclass  while 
the  comma  is  used  to  separate  the  classes  within  the  parenthesized 
unit.      Units  within  an  expression  are  separated  by  commas  and  the 
entire  expression  itself  is  enclosed  by  parentheses.     As  demonstrated 
in  the  preceding  section,    'PERSONNEL  RECORD"  consists  of  the 
classes:     "NAME,  "  "ADDRESS,  "  and  "TELEPHONE  NUMBER.  "    This 
definition  is  called  the  format  definition  and  is  the  foundation  for  the 
construction  of  the  classificatory  tree.      Format  definitions  are 
represented  by  the  parenthesized  expression     shown  below. 

PERSONNEL  RECORD  (NAME,    ADDRESS,    TELEPHONE  NUMBER) 
"NAME"  and  "ADDRESS"  were  further  divided  into  subclasses 
and  the  expressions  below  show  the  parenthesized  forms  for  "class 
definitions.  " 

NAME  (LAST,    FIRST,    MIDDLE) 
ADDRESS  (STREET,    CITY,    STATE) 

Subclasses  may  also  be  subdivided  and  this  process  is  replicated 
to  fully  define  the  class  structure  of  the  universe  of  discourse.      Figure 
4  graphically  demonstrates  the  class   structuring  process,    the  fully 
parenthesized  expression  for  the  class  structure,    and  the  associated 
classificatory  tree.     Although  the  above  example  does  not  include  a 
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Parenthesized  Class  Expressions  and  Associated  Free 
Structure  for  the  Hierarchical  Classification  of  Data, 
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subdivision  for  the  class   "STREET"  one  is   shown  in  the  tree  structure 
to  demonstrate  a  third  level  of  class  replication. 
2.      Data  Representation 

Once  the  class  structure  is  defined,    the  associated  data  may- 
be mapped  directly  onto  the  structure.      Data  representation  is  identical 
to  the  class  expression  as  shown  below. 


((DOE,  JOHN,  JAMES),  (203  ELM  STREET,  MONTEREY,  CA.  ),  384-9363) 

i  j    i  ' — r~ 

((LAST,  FIRST,  MIDDLE),  (STREET,  CITY,  STATE),  (TELEPHONE  NO.  ) 

j  I  1 

NAME,  ADDRESS,  TELEPHONE  NO. 


EMPLOYEE  RECORD 


Representation  of  repeated  data  elements  within  the  record  are 

easily  handled  by  properly  parenthesizing  the  record.     For  example, 

two  phone  numbers  for  John  Doe  would  be  represented  by: 

((DOE,  JOHN,  JAMES),  (203  ELM  STREET,  MONTEREY,  CAL.  ), 

(384-9363  ,  384-6214)) 

The  class  membership  of  each  data  element  in  the  record  is 
clearly  defined  by  the  parenthesized  expression. 

3.      System  Utility 

The  utility  of  hierarchical  classification  in  association  with 
parenthesized  expressions  is  realized  by  the  user  in  three  ways: 
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1.  The  indexing  techniques  presented  in  section  II  require  the 
user  to  conform  to  the  language  devised  by  the  system  designer  for 
the  retrieval  of  information.     The  user  does  not  have  the  option  of 
defining  the  indexing  language  that  best  suits  his  particular  needs  but 
must  be  satisfied  with  the  indexing  technique  employed  to  best  satisfy 
the  needs  of  all  users.     In  contrast,    this  system  allows  each  user  or 
user  group  to  define  his  own  indexing  language  by  defining  the  class 
structure  associated  with  the  data  he  is  most  concerned  with.     In  other 
words,    the  system  will  accept  a  mix  of  data  allowing  each  user  or  user 
group  to  have  his  own  retrieval  system  within  a  retrieval  system. 
Each  user  or  user  group  must  define  the  class  structure  of  his  data. 
For  example,    a  business-oriented  system  might  consist  of  a  data  base 
partitioned  into  employee  records,    pay  records,    stock  inventory,    etc. 
Such  a  system  would  simultaneously  serve  the  needs  of  many  users. 

2.  The  user  has  the  capability  of  entering  the  data  structure  in 
several  ways  to  extract  desired  information.     In  the  personnel  record 
example,    the  user  may  retrieve  complete  records  which  satisfy  cer- 
tain search  keys,    or  retrieve  only  the  names  of  personnel,    or  retrieve 
the  phone  number  of  a  particular  person,    and  so  on. 

3.  The  classification  scheme  could  serve  as  an  intermediate 
language  between  the  query  processor  and  the  retrieval  system. 


B.         RETRIEVAL  PROCESS 
1.     Query  Format 

Queries  are  presented  to  the  system  utilizing  the  same  for- 
mat as  class  expressions.     The  fully  parenthesized  expression  contains 
search  keys  and  blank  positions  which  specify  the  information  to  be 
supplied  to  the  user.     The  retrieval  processor  will  fill  in  the  blank 
positions  with  all  of  the  information  contained  in  the  data  base  which 
satisfies  the  search  keys.     The  expression  must  conform  identically  to 
the  fully  parenthesized  expression  used  to  represent  the  class  structure, 

((  DOE,    JOHN,  ),    ( ,  ),   ) 
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In  the  example  above,    the  system  will  identify  the  class  member- 
ship of  each  search  key  and  blank  position  through  the  classificatory 

I 
tree  constructed  from  the  class  expression.     A  search  is  then  instituted 

for  all  records  which  contain  an  occurrence  of  "DOE"  as  a  member  of 
the  class   "LAST"  and  "JOHN"  as  a  member  of  the  class  "FIRST.  " 
Information  is  extracted  from  those  appropriate  records  to  fill  the 
blank  positions  of  the  query.     The  user  may  broaden  or  narrow  the 
amount  of  information  retrieved  by  the  number  and/or  class  of  search 
keys  used  in  the  query.     A  query  containing  only  the  search  key 
"CALIFORNIA"  could  produce  a  greater  amount  of  information  than  a 
query  which  has  only  one  blank  position. 

2.     Boolean  Expressions 

The  ability  to  use  Boolean  expressions  such  as   "and,  "  "or,  " 
"not,  "  etc.  ,    is  desirable  in  any  information  retrieval  package.     How- 
ever,   the  degree  to  which  Boolean  expressions  may  be  used  is  left  to 
the  perogative  of  the  system  designer  in  satisfying  user  needs.     The 
use  of  Boolean  "and"  is  accepted  by  the  retrieval  processor  in  this 
system  and  is  identified  by  the  amphersand: 
(( , , ),    ( ,    MONTEREY   &  MARINA,    CALIFORNIA), ) 

In  this  case,    the  names,    street  addresses,    and  phone  numbers 
of  all  personnel  who  live  in  Monterey,    California  and  Marina, 
California  would  be  produced. 

The  use  of  Boolean  "or"  is  not  directly  used  in  this  system  but 
its  effect  is  similar  to  the  use  of  alphabetic  and  numeric  range  requests. 
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3.     Alphabetic  and  Numeric  Ranges 

Alphabetic  and  numeric  range  requests  are  identified  by  the 
colon.     Examples  of  range  requests  are  exhibited  below. 

((A  :  D, ,   ),    ( ,    MONTEREY,    CALIFORNIA),   ) 

The  retrieval  processor  identifies  an  alphabetic  range  request  for  all 
data  elements  which  are  members  of  the  class   "LAST"  and  which  have 
as  a  first  letter  A,    B,    C,    or_  D.     The  records  of  all  personnel  who  live 
in  Monterey,    California  and  whose  last  names  begin  with  A  through  D 
inclusive  would  be  produced. 

As  shown  immediately  above,    the  system  does  not  restrict  the 
use  of  alphabetic  or  numeric  ranges  to  single  letters  but  any  number  of 
characters  may  be  used  and  any  number  of  range  requests  are  possible 
within  a  single  query. 

The  above  discussion  is  also  true  for  numeric  range  requests. 
For  example,    the  user  desires  complete  records  for  all  those  personnel 
who  have  specific  telephone  exchanges: 

((_,_,_),      (_,   _,    _),    372:394) 
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VII.       SYSTEM  STRUCTURE 

This  section  discusses  the  internal  design  of  the  general-purpose 
fact-retrieval  system  employing  the  data-structure  technique  previously 
explained.     The  system  was  implemented  on  the  Naval  Postgraduate 
School's  IBM  360  Model  67  Computer  and  is  an  interactive  system  under 
control  of  the  Cambridge  Monitor  System  (CP/CMS)  [Ref.    13]. 

A.         DATA  FILES 

Data  files  are  stored  on  punched  cards  and  consist  of  the  following 
three  types: 

1.  Format  definition  cards.      These  cards  define  the  class 
structure  for  each  universe  of  discourse  to  be  included  in  the  data 
base.     An  example  of  a  format  definition  card  is: 

EMPLOYEE  RECORD  (NAME,  ADDRESS,    AGE,    CHILDREN) 

2.  Class  definition  cards.     These  cards  further  define  the 
structure  of  the  classes  contained  in  the  format  definition.     Examples 
of  class  definition  cards  are: 

NAME  (LAST,    FIRST) 
ADDRESS  (STREET,    CITY,    STATE) 

3.  Data  records.     The  data  records  contain  the  data  elements 

associated  with  the  universe  of  discourse  and  are  fully  parenthesized 

expressions.     An  example  of  a  data  record  is: 

EMPLOYEE  RECORD  ((DOE,  JOHN),  (203  ELM  STREET,  MONTEREY,  CA.  ), 

(48),    (MARY,  SALLY)) 
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Format  definitions,    class  definitions,    and  data  records  may  also 
be  entered  into  the  system  via  on-line  terminal.      For  a  large-scale 
data  base,    the  data  records  could  be  stored  in  unstructured  form  on  a 
back-up  storage  device  such  as  magnetic  tape.      Structuring  of  records 
would  be  accomplished  under  program  control  according  to  pre-stored 
format  and  class  definitions. 

B.         TREE-TYPE  DATA  STRUCTURES 

A  tree-type  data  structure  is  employed  to  represent  the  hierarch- 
ical classification  of  a  universe  of  discourse.      The  tree-structuring 
process  described  later  in  this  section  employs  data  cells  to  represent 
nodes  within  a  tree  and  the   "chaining"  technique  to  order  the  cells  into 
tree  structure  form. 

1.     Data  Cells 

Data  cells  available  to  the  tree-structuring  processor  consist 
of  three  fields.  The  description  and  function  of  each  field  is  described 
below: 

a.  The  identifier  field,    referred  to  as  "TOP,  "  contains  the 
storage  address   (pointer)  of  the  data  or  class  entity  which  the  data  cell 
represents. 

b.  The  right  link  field,  referred  to  as  "RIGHT,  "  contains  a 
pointer  which  is  used  to  chain  the  data  cell  to  another  data  cell  on  the 
same  level  of  the  tree. 

c.  The  down  link  field,    referred  to  as   "DOWN,  "  contains 
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a  pointer  which  is  used  to  chain  the  data  cell  to  another  data  cell 
located  in  a  lower  level  of  the  tree.      Figure  5  demonstrates  the  use 
of  data  cells.     A  zero  in  a  link  field  signifies   "no  link"  or  a  null  field. 

2.      Structuring  Process 

Empty  data  cells  are  constructed  in  core  storage  through 
list  structuring  techniques  and  are  stored  in  an  area  available  to  the 
tree -structuring  routine.      The  reading  of  a  format  definition  card 
initiates  the  structuring  process.     The  format  name  (e.g.  ,    EMPLOYEE 
RECORD)  and  the  class  names  contained  on  the  card  are  extracted 
and  moved  into  storage  (a  discussion  of  this  process  is  deferred  to  a 
later  section).     A  number  of  cells  equal  to  the  format  name  plus  the 
number  of  class  names  contained  on  the  card  are  retrieved  and  tree 
structuring  commences.     The  first  cell  in  the  tree  structure  is  called 
a  "header"  and  serves  to  identify  the  format  name  of  the  tree.     Each 
of  the  classes  contained  in  the  format  definition  is  assigned  to  a  data 
cell  and  the  cells  are  chained  together.     Figure  6  shows  the  structure 
representing  the  format  definition: 

EMPLOYEE  RECORD  (NAME,    ADDRESS,    AGE,    CHILDREN) 

Before  completing  the  discussion  of  tree  structuring  it  is  import- 
ant to  note  that  class  definitions  throughout  the  various  universes  of 
discourse  in  the  data  base  must  be  consistent.     That  is  to  say,    if  the 
class  called  "NAME"  is  defined  as   (LAST,    FIRST)  then  every  occur- 
rence of  "NAME"  must  consist  of  the  classes   "LAST"  and  "FIRST.  " 
If  this  is  not  done,    confusion  arises  during  the  retrieval  process  when 
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EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE, CHILDREN) 

The  numbers  in  the  TOP  fields  correspond  to: 

1  EMPLOYEE  RECORD 

2  NAME 

3  ADDRESS 

4  AGE 

5  CHILDREN 


Figure  6 
Tree  Structure  Composed  of  Data  Cells 
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the  processor  attempts  to  identify  the  class  memberships  of  data 

elements.      Therefore,    as  each  format  definition  is  read,    a  search  is 

I 
conducted  of  all  previously  constructed  trees  to  determine  whether  or 

not  each  of  the  classes  contained  in  the  definition  being  processed 

have  been  previously  used.     If  a  class  has  been  previously  used  then 

the  tree  structure  representing  the  class  is  appended  to  the  tree  being 

built.     If  a  class  has  not  been  previously  used  then  a  class  definition 

card  must  be  submitted  to  the  tree-structuring  processor. 

After  the  format  definition  card  has  been  processed  any  class 

definition  cards  associated  with  the  structure  are  processed.     Figure  7 

contains  a  completed  tree   structure  for: 

EMPLOYEE  RECORD    (NAME,    ADDRESS,    AGE,    CHILDREN) 

NAME       (LAST,    FIRST) 

ADDRESS     (STREET,    CITY,    STATE) 

C.         INDEX  FILES 

The  system  incorporates  an  index  file,    called  the  master  index, 
which  demonstrates  many  of  the  characteristics  and  advantages  of  an 
inverted  index.      The  master  index  contains  format  names,    class   names, 
and  data  elements.      Each  entry  in  the  index  has  a  pointer  associated 
with  it  which  links  the  entry  to  a  tree  structure,    data  record,    or 
further  information  concerning  the  entry.      The  retrieval  process  is 
always  initiated  at  the  master  index  since  it  is  the  agent  which  directs 
the  search  for  information  in  response  to  a  user's  query. 
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EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE,  CHILDREN) 
NAME  (LAST,  FIRST) 
ADDRESS  (STREET,  CITY,  STATE) 

The  numbers  in  the  TOP  fields  correspond  to: 


1  EMPLOYEE  RECORD 

2  NAME 

3  ADDRESS 

4  AGE 

5  CHILDREN 


6  LAST 

7  FIRST 

8  STREET 

9  CITY 
10  STATE 


Figure  7 
Tree  Structure  for  the  Format:   "EMPLOYEE  RECORD' 
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1.  Characteristics  of  the  Master  Index 

Conceptually,    the  master  index  is  a  large  matrix  consisting 
of  fixed-length  records   (matrix  rows),    each  containing  eight  fields 
(matrix  columns),    as  shown  in  Figure  8.     The  first  four  characters  of 
format  names,    class  names,    and  data  elements  are  stored  in  the  first 
four  fields  of  the  index.      Entries  which  contain  more  than  four  char- 
acters are  then  stored  in  a  sequential  storage  area  reserved  for 
variable -length  records.     The  remaining  four  fields  of  each  index 
record  contain  information  concerning  the  type  of  entry  (e.g.  ,    format 
name,    class  name,    or  data  element),    the  sequential  store  address  of 
the  full  character  representation  of  the  entry,    if  any,    pointers  to  infor- 
mation-bearing data  cells,    and  other  information  useful  to  the  retrieval 
processor. 

2.  Constructing  the  Master  Index 

The  first  record  of  the  master  index  is  reserved  as  a  table 
of  all  format  names  contained  in  the  data  base.  The  first  record  con- 
tains the  address  of  the  first  data  cell  (identical  to  the  data  cells  used 
in  tree  structuring)  in  a  chain  of  cells  and  each  cell  contains  the 
address  of  a  format  name  located  in  sequential  storage.  Through  this 
record  a  user  may  quickly  determine  the  partitioning  of  the  data  base. 
Figure  9  demonstrates  the  idea. 

Format  names  are  entered  in  the  index  and  linked  to  their 
definitions  which  are  located  in  sequential  storage.     Each  of  the  clas- 
ses contained  in  the  format  definitions  are  also  stored  in  the  index. 
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COLUMN  (S) 

1-4  :    First  4  characters  of  the  entry 

5  :    No.  if  the  entry  is  a  format  name 

"C"   if  the  entry  is  a  class  name 

"L"   if  the  entry  is  the  lowest  level  class  in  a 

tree  structure 

"D"   if  the  entry  is  a  data  element 

6  :   Pointer  to  the  full  character  representation  in 

sequential  store 

7  :    Pointer  to  associated  chain  of  data  cells  if  the 

entry  is  classified  "L",  otherwise  pointer  to 
sequential  store 

8  :   Pointer  to  associated  data  cell  in  the  tree  structure 

if  the  entry  is  a  class  or  format. 

Pointer  to  associated  chain  of  data  cells  if  the  entry 

is  a  data  element. 


Figure  8 
Representation  of  the  Master  Index 
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Figure  9 

Reserved  Record  in  the  Master  Index  for  Format  Names  with 
Associated  Data  Cells  and  Format  Names  in  Sequential  Store 
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Associated  with  each  class  entry  in  the  index  is  a  string  of  data  cells 
which  contain  two  items  of  information  concerning  the  class: 

a.  The  first  field  contains  the  number  of  the  data  record  which, 
in  turn,    contains  an  occurrence  of  the  class.      (This  information  is 
added  when  the  data  records  are  read  and  is  discussed  later.  ) 

b.  The  second  field  contains  a  number  corresponding  to  the  for- 
mat name  which  contains  this  class  entry. 

A  class  may  be  used  in  any  number  of  different  format  definitions 
but  its  structure  must  be  consistent  in  every  occurrence.      Therefore, 
regardless  of  the  number  of  format  definitions  which  contain  a  given 
class,    there  is  only  one  index  record  for  the  class.      The  data  cells 
appended  to  the  class  entry  provide  the  retrieval  processor  with  data 
such  as  the  format  definitions  in  which  the  class  appears.     Among 
other  things,    information  pertaining  to  the  class  entries  provides  the 
retrieval  processor  with  the  capability  of  quickly  abandoning  a  search 
when  a  user  requests  information  through  a  class  which  is  not  a 
member  of  the  format  being  queried. 

Class  definitions  are  processed  in  a  manner  very  similar  to  for- 
mat definition   processing.      The  class  being  defined  is  entered  in  the 
index  and  the  definition  is   stored  as  read  in  the  sequential  store.      The 
system  returns  the  sequential  store  address  and  enters  it  in  the  index 
record.     Appropriate  data  cells  are  appended  to  the  index  and  the 
class  structure  is  added  to  the  classificatory  tree.     When  the  tree  is 
completed,    those  classes  which  are  end  nodes  in  the  classificatory 
tree  (e.g.,    LAST,    FIRST,    STREET,    CITY,    STATE,    AGE,    and 
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CHILDREN  in  EMPLOYEE  RECORD)  are  identified  and  their  index 

records  are  flagged.     This  is  done  to  ensure  that  elements  in  the  data 

i 
| 

records  are  mapped  onto  the  tree  structure  according  to  their  proper 
class  membership. 

As  each  data  record  is  read  into  the  system  it  is  assigned  a 
unique  number  and  placed  in  the  sequential  store.     Each  element  within 
the  record  is  examined  to  determine  its  class  membership  and  the 
master  index  is  searched  to  determine  if  the  element  was  previously 
entered  by  another  data  record.     The  possibility  of  a  data  element 
appearing  in  more  than  one  record  exists  if  the  data  base  contains 
similar  formats  such  as  employee  records  and  pay  records.     In  ad- 
dition,   a  data  element  may  be  a  member  of  more  than  one  class  such 
as  the  occurrence  of  "JOHN"  as  a  member  of  both  classes   "FIRST" 
and  "CHILDREN,  "    It  is  highly  desirable  that  there  be  only  one  entry 
in  the  master  index  for  those  elements  which  occur  more  than  once. 
Unique  entries  in  the  index  guarantees  that  when  an  item  is  located  in 
the  index,    the  search  process  is  complete  and  successful.     Additionally, 
the  need  for  combined  search  plans  is  eliminated.     Specific  record 
and  class  membership  information  for  each  data  element  entered  in 
the  index  is  resolved  by  appending  data  cells  to  the  master  index  entry. 
The  data  cells  contain  the  record  number(s)  from  which  the  element 
was  extracted  and  its  class  membership(s).     Assuming  that  a  data 
element  occurs  several  times  in  the  data  base,    the  master  index  would 
still  contain  only  one  record  for  the  element.     The  record  contains  all 
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of  the  information  pertinent  to  the  retrieval  process.      The  technique, 
relevant  to  both  class  and  data  entries,    results  in  two  important 
savings : 

1.  A  significant  reduction  of  storage  space  is  realized  (if  an 
element  occurs  several  times)   since  multiple  entries  in  the  master 
index  require  more  storage  space  than  a  single  record  and  its 
associated  data  cells. 

2.  A  significant  reduction  in  search  time  is  realized  since  multiple 
entries  require  the  retrieval  processor  to  conduct  a  full-file  search 
each  time  it  enters  the  master  index. 

3.      Data  Record  Table 

Cells  appended  to  each  data  element  stored  in  the  master 

index  do  not  contain  the  sequential  store  addresses  of  the  records 

from  which  the  data  elements  were  extracted.     This  information  is 

stored  separately  in  a  toble  referred  to  as  a  data  record  table.     The 

data  record  table  augments  the  information  contained  in  the  master 

index  and  is  composed  of  fixed-length  records  as  shown  in  Figure   10. 

Each  table  record  consists  of  three  fields  which  contain: 

a.  The  unique  data  record  number. 

b.  Format  membership  of  the  data  record. 

c.  Sequential  store  address  of  the  data  record. 
The  data  record  table  serves  two  functions: 

a.  The  retrieval  processor  bypasses  the  master  index  and  directly 
enters  the  data  record  table  to  satisfy  requests  for  all  data  records 
which  are  members  of  a  particular  universe  of  discourse. 

b.  The  table  is  also  utilized  for  queries  other  than  those  which 
request  "all  data  records.  "    The  retrieval  processor  searches  the 
master  index  to  determine  the  data  records  which  satisfy  a  user's 
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DATA  RECORD 
TABLE 
12       3 


1 


SEQUENTIAL 
STORE 


■M(DOE,  JOHN),  (80  WHITNEY,  .. 


->( (SMITH,  BILL),  (32  CAPITAL , 


->(  (041305416),  (WRENCH),... 


->((DOE,  JOHN),  (094-63-3152) 


■>((EA  3733,  CONN),  (BUICK,  . . . 


FORMAT  NUMBER 
1 
2 
3 
4 


FORMAT  NAME 
EMPLOYEE  RECORD 
CAR  REGISTRATION 
PAY  RECORD 
STOCK  INVENTORY 


COLUMN 

1 
2 
3 


Unique  record  number 

Format  membership  of  the  data  record 

Pointer  to  data  record  in  sequential  store 


Figure  10 
Representation  of  the  Data  Record  Table 
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request.     Then  the  processor  enters  the  data  record  table  and  extracts 
the  sequential  store  addresses  of  the  records.     The  sequential  store 
addresses  are  passed  to  the   "output"  section  of  the  retrieval  processor. 

The  information  contained  in  the  data  record  table  is  tabulated 

separately  from  the  master  index  to  achieve  savings  in  storage  space 

and  response  time.     Storage  savings  are  realized  since  the  addresses 

of  data  records  in  the  sequential  store  are  contained  only  in  the  data 

record  table  and  are  not  replicated  in  the  master  index  for  each  class 

and  data  element.     System  response  time  is  reduced  for  queries  that 

request  all  data  records  of  a  particular  universe  of  discourse  since 

the  data  record  table  was  designed  primarily  to  expedite  this  type  of 

request.     The  retrieval  processor  extracts  all  of  the  necessary  data 

record  addresses  in  one  access  of  the  table.     The  amount  of  searching 

within  the  table  is  minimal. 

D.         INFORMATION  FILE 

The  "sequential  store"  is  the  system  information  file,    or  data 
base.     It  contains  the  data  records,   format  definitions,    class  definitions, 
and  the  full  character  representation  of  those  entries  in  the  master 
index  consisting  of  more  than  four  characters.      Figure   11   shows  the 
sequential  store  and  its  relationship  to  the  master  index  and  the  data 
record  table. 

The  information  file  is  resident  in  main  core  storage.     The 
variable -length  records  of  this  file  are  sequentially  ordered.     System 
information  files  are  not  normally  stored  in  main  core  unless  they  are 

59 


CO 


w 

CO 


X 
W 
O 

M 

W 
H 
CO 


•-V 

>? 

g 

* 

W 

o 

• 

g 

u 

_ 

M 

A 

/_N 

j§ 

CO 

co 

J3 

r-~ 

s 

g 

o 

00 

CO 

O 

Q 

v ' 

<! 

o 

►J 

w 

a 

M 

c_> 

^~s 

v— • 

* 

w 

• 

• 

IS 

w 

r* 

X 

\ 

>-< 

w 

o 

V 

o 

o 

• 

t—y 

\ 

<3 

CO 
CO 

o 

Q 

\ 

1 

a 

s^s 

• 

o 

• 

X 

1 

CO 

<3 

• 

H 

CO 

\ 

^ 

< 

5j 

\ 

\ 

^ 

<c 

\ 

E=>\ 

o 

E3 

\ 

FQ 

3 

1 

^ 

\ 

^-^ 

CO 

t— 1 

1 

<t 

— 

\ 

CT\ 

r~~ 

CT> 

N 

\ 

vo 

1 

o 

\ 

m 

I— 1 

• 

• 

hJ 

• 

• 

• 

>J 

<r 

CO 

hJ 

w 

co 

<C 

P4 

w 

Pi 

CNI 

eJ 

a 

• 

• 

o 

• 

• 

• 

H 

i—t 

cj 

w 

< 

CO 

r-icsiro^i-mvor^cocTi 


w 

.-) 
PQ 

<c 

§ 
O 

cj> 
2 

a 

Q 


en 


CNI 


\ 

i-H 

CN 

i— ♦ 

CNI 

CO 

<t 

Figure   11 

Relationship  between  the  Master  Index,  Data  Record  Table,  and 
Sequential  Store 
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relatively  small  (which  is  the  case  here).      However,    it  is  imperative 
that  such  a  file  be  resident  on  a  direct  access  storage  device  in  order 
to  provide  satisfactory  system  response  time. 

E.         RETRIEVAL  PROCESSOR 

The  retrieval  processor  is  divided  into  three  operations.     The 
identification  operation  determines  the  type  of  query  posed  by  the 
user;  the  search  operation  determines  the  data  record  numbers  which 
satisfy  the  user's  request;  the  output  operation  retrieves  the  resultant 
data  records  from  the  sequential  store  and  prints  them  at  the  terminal. 
Additionally,    special  messages  are  output  to  the  user  in  the  form  of 
error  messages  to  warn  him  of  invalid  queries,    and  messages  which 
notify  him  of  unsatisfied  queries. 

1 .      Query  Types 

The  IR  system  designer  strives  to  achieve  total  utility  of  the 
system  by  providing  the  user  with  a  powerful  retrieval  language. 
Utility  of  the  data  structure  used  in  this   system  is  realized  by  the 
various  types  of  queries  available  to  the  user  for   extracting  informa- 
tion from  the  data  base.     There  are  four  major  types  of  queries  avail- 
able to  the  user. 

a.     Determining  Data  Base  Partitions. 

As  previously  discussed,    the  data  base  may  be  partitioned 
to  allow  a  mix  of  unrelated  information  by  defining  the  class  structure 
of  each  universe  of  discourse  in  the  data  base.     A  user  who  is 
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unfamiliar  with  the  data  base  partitions   (format  names)  may  easily 

determine  this  information  by  submitting  a  special  type  of  query.      The 

i 

format  of  the  query  is   simple  and  consists  of  the  single  search  key: 
"CLASS.  "     This  is  translated  by  the  retrieval  processor  as:     "Output 
the  names  of  all  formats  contained  in  the  data  base.  "    Search  of  the 
master  index  is  then  centered  at  the  first  record  of  the  index  and  its 
associated  chain  of  data  cells  which  contain  the  sequential  store  ad- 
dresses of  the  format  names.     All  format  names  contained  in  the  data 
base  are  output  to  the  user. 

QUERY:    CLASS 
RESPONSE:    EMPLOYEE  RECORD 
PAY  RECORD 


b.     Determining  Format  and  Class  Definitions. 

In  order  to  extract  data  from  a  specific  universe  of  discourse, 
the  user  must  be  provided  with  its  class  structure.     The  class  structure 
determines  the  format  for  data  record  requests.      Queries  of  format 
and  class  definitions  must  contain,    as  a  search  key,    the  format  name 
or  class  name  to  be  defined.     The  search  processor  enters  the  master 
index  to  locate  the  format  name  or  class  name,    extracts  the  address 
of  its  definition  located  in  the  sequential  store,    and  the  definition  is 
output  directly  at  the  terminal. 
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QUERY: 
RESPONSE: 

QUERY: 
RESPONSE: 

QUERY: 
RESPONSE: 


EMPLOYEE  RECORD 


(NAME,  ADDRESS,  AGE,  CHILDREN) 


NAME 


(LAST,    FIRST) 


AGE 


NO  DESCENDANTS 


c.     Data  Element  Retrieval. 

One  asset  of  the  data  structure  concept  is  that  it  allows  the 
user  to  extract  single  data  elements  from  the  data  base  which  are 
members  of  a  particular  class  and  format,    or  members  of  a  particu- 
lar class  irrespective  of  the  format  membership.     Since  data  elements 
are  mapped  onto  the  end  nodes  of  their  respective  tree  structures,    the 
user  must  use  the  lowest  level  classes  of  the  structure  as  search  keys. 
Failure  to  do  so  prompts  the  retrieval  processor  to  output  corrective 
information  to  the  user.     The  hyphens  in  the  queries  below  indicate  to 
the  retrieval  processor  that  the  expressions  are  queries  and  not  for- 
mat definitions.     The  processor  could  identify  the  expression  by 
searching  the  master  index  for  an  occurrence  of  "EMPLOYEE  RECORD.  " 
A  successful  search  would  indicate  that  a  format  definition  already 
existed  in  the  system.     However,    use  of  the  hyphen  is  a  simpler  and 

63 


faster  method  for  positively  identifying  the  type  of  expression  submit- 
ted to  the  system. 

QUERY:         EMPLOYEE  RECORD  (NAME, ) 

RESPONSE:  INVALID  QUERY : 

DETERMINE  DESCENDANTS  OF:     NAME 
USE  DESCENDANTS  AS  KEYWORDS 


QUERY:         EMPLOYEE  RECORD  (LAST, ) 

RESPONSE:  BROWN 

SMITH 
THOMPSON 
To  answer  the  above  query,    a  search  is  conducted  in  the  master 
index  for  all  data  elements  which  are  members  of  the  class  "LAST" 
and  are  members  of  the  format  "EMPLOYEE  RECORD.  "    This  infor- 
mation is  contained  in  the  data  cells  appended  to  each  data  entry  in  the 
index.     Elements  which  satisfy  the  query  are  taken  directly  from  the 
master  index,    and  output  at  the  terminal. 

In  the  query  below,    the  hyphen  is  used  to  differentiate  between  a 
query  and  a  class  definition  statement.     All  data  elements  which  are 
members  of  the  class   "LAST"  are  output  irrespective  of  format 
membership.     The  format  membership  fields  of  the  data  cells  are 
ignored  during  the  search  of  the  master  index. 

QUERY:  LAST  ( ) 


RESPONSE: 


BROWN 

CHAMBERS 

COTTLE 

DOE 

SMITH 

THOMPSON 
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d.      Data  Record  Retrieval. 

Data  record  retrieval  is  the  most  valuable  and  would  be  the 

I 

most  frequently  used  type  of  request  available  to  the  system  user. 
Extraction  of  complete  data  records  which  satisfy  the  search  keys 
contained  in  the  query  is  accomplished.     To  retrieve  data  records,    the 
queries  contain  data  elements  as  search  keys  and  may  contain  Boolean 
"AND,  "  alphabetic  and/or  numeric  ranges,    or  any  combination  thereof. 
The  query  format  is  a  fully  parenthesized  expression  as  shown  in 
previous  sections.     Search  keys  are  positioned  in  the  expression  with 
respect  to  class  membership  and  hyphens  inserted  in  those  positions 
for  which  information  is  requested.     Any  variation  from  the  properly 
parenthesized  expression  prompts  error  messages  from  the  retrieval 
processor  to  the  user. 

The  retrieval  process  for  the  query  listed  below  is  explained  in 
the  following  paragraphs: 

EMPLOYEE  RECORD    ((DOE, — ),    ( — , — ,  CA.  )( ),  ( )) 

The  format  name  appearing  at  the  beginning  of  the  query  expres- 
sion informs  the  retrieval  processor  of  the  universe  of  discourse  in 
which  the  user  is  interested.     The  processor  then  traverses  the  tree 
structure  for   "EMPLOYEE  RECORD"  to  determine  the  lowest  level 
classes  in  the  tree.     This  information,    in  conjunction  with  the  proper 
use  of  parentheses  in  the  query  expression,    allows  the  processor  to 
identify  the  class  memberships  of  the  search  keys  contained  in  the 
query.     The  user  is  notified  whenever  the  processor  is  unable  to  find  a 
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search  key  in  the  master  index.     In  this  case,    the  processor  attempts 
to  recover  data  which  satisfies  the  remaining  search  keys.     Similar 
action  takes  place  when  the  processor  encounters  a  search  key  which 
is  not  a  member  of  the  class   specified  in  the  query,    or  if  a  search  key 
is  not  a  member  of  the  format  specified  in  the  query.     Additionally, 
the  user  is  notified,  whenever  the  query  is  improperly  formatted. 

Each  search  key  in  the  query  is  processed  sequentially.      The 
retrieval  processor  searches  the  master  index  for  an  occurrence  of 
each  key.     Record  numbers  which  contain  an  occurrence  of  the  search 
key  are  extracted  and  stored  in  a  list.     After  all  search  keys  have  been 
processed,    the  retrieval  processor  "ANDS"  the  record  numbers  in  the 
list  to  determine  which  records   satisfy  the  query.     For  example, 
assuming  that  two  key  words  are  used  and  record  numbers  5,    32,    and 
67  satisfy  the  first  key  word,    and  record  numbers  32  and  67  satisfy 
the  second  key  word,    records  32  and  67  are  output  to  the  user.     Re- 
cord numbers  which  satisfy  the  query  are  passed  to  the  "output"  sec- 
tion of  the  retrieval  processor  which  retrieves  the  sequential  store 
addresses  of  the  records  from  the  data  record  table  and  prints  the 
records  at  the  terminal. 

A  user  has  the  ability  to  immediately  examine  the  results  of  his 
query  since  the  system  is  interactive.     The  results  of  one  query  may- 
prompt  the  user  to  submit  another  request,    either  broadening  or 

narrowing  the  request  through  judicious  use  of  search  keys.     In  any 
case,    the  user  is  guaranteed  that  if  the  information  that  he  seeks  is 
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contained  in  the  data  base,    he  will  have  quick  and  easy  access  to  it. 
Appendix  A  contains  a  sample  run  of  the  fact-retrieval  system  and 
demonstrates  all  of  the  queries  available  to  a  user  and  the  system 
responses. 

F.         ALTERING  THE  DATA  BASE 

1.  Changes  and  Deletions 

Due  to  the  experimental  nature  of  the  system,    no  utility 
routines  have  been  provided  for  deleting  records  or  making  changes 
to  existing  records.     Alterations  are  accomplished  by  manually 
changing  the  card  images  in  the  data  files. 

2.  Additions 

The  addition  of  data  records  to  existing  data  sets  or  the  sub- 
mission of  new  universes  of  discourse  are  acco"    olished  most  easily 
without  special  utility  routines.      This  feature  is  inherently  built  into 
the  system  through  the  data  structuring  technique.     Addition  of  a  new 
universe  of  discourse  is  accomplished  by  submitting  format  and  class 
definitions,    and  associated  data  records  either  on-line  through  the 
terminal  (automatically)  or  off-line  with  card  images  (manually).     New 
data  records  may  also  be  added  to  exi    ting  data  files  automatically  or 
manually. 


67 


VIII.       CONCLUSIONS 

Characteristics  of  the  data-structuring  concept  as  used  in  a 
general-purpose  fact-retrieval  system  have  been  discussed  throughout 
the  preceeding  sections.     These  concepts  are  summarized  here. 

The  data  structuring  technique  encompasses  the  concept  of 
hierarchical  classification  which  is  the  most  widely  used  method  of 
indexing.     Hierarchical  classification  of  data  is  a  relatively  simple 
technique  to  use  but  possesses  the  power  to  divide  and  subdivide  a 
universe  of  discourse  into  more  specific  subjects.     Additionally, 
hierarchical  structures  may  be  created  to  include  a  domain  of  subjects. 
This  is  advantageous  for  use  in  a  fact-retrieval  system,    as  previously 
demonstrated,   by  providing  a  mix  of  structures  in  a  single  data  base. 
Therefore,    users  with  differing  interests  are  provided  simultaneous 
access  to  a  single  system  since  each  is  provided  a  "personal"  retrieval 
system  within  a  larger  retrieval  system.     In  addition,    the  hierarchical 
structure  provides  a  user  with  multiple  avenues  of  access  into  his 
information  file. 

Parenthesized  expressions  serve  as  an  intermediate  language 
between  the  query  processor  and  the  information  retrieval  system. 
The  query  processor  is  able  to  determine  the  class  memberships  of 
elements  within  an  expression  by  examination  of  the  parenthesized 
form.     It  is  apparent,    however,    that  the  use  of  parenthesized  expres- 
sions is  cumbersome  and  demanding  since  misplacing  parentheses  is 
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easy  to  do  and  causes  loss  of  meaning  of  the  expression.  On  the  other 
hand,  it  can  be  argued  that  the  technique  of  parenthesizing  expressions 
is  powerful  and  an  equally  powerful  substitute  is  difficult  to  theorize. 
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COMPUTER  PROGRAM 

IMPLICIT    INTEGER*2    (A-Z) 

DIMENSION    FTAB(30) 

DIMENSION    UPRAN(30) 

DIMENSION    RECTAB(IOO) 

DIMENSION    T0P(603) ,RIGHT(602) , DOWN (601) 

DIMENSION    WORK (241 ) , ACUM ( 2  41 ) , FRST4 ( 24 1 ) 

DIMENSION    LEVEL ( 50) ,DRTAB( 50,3) , DESTAB(30) , N0DTAB(50) 

DIMENSION    RN0(603) TCMEM( 602 ) , NEXT( 601 ) 

DIMENSION    SERCH(30) , MINDEX( 500 , 8 ) T SEQ( 4000 ) 

COMMON /ONE/TOP, AVAIL1 t P*Q» RRtDD, YY 

C0MM0N/TW0/RN0,AVAIL2,S,R,CM,RM 

COMMON /THREE/ANS,SERCH, MI NDEX, ACUM , S J , SK , N, SEQ 

COMMON /FOUR/NODTAB,FD 

COMMON/ F I VE/MULT,MK,EK 

COMMON /S I X/S2, AT , AE , HCTR ,UPRAN 

EQUIVALENCE    ( TOP ( 3 ) , R I GHT( 2) tDOWN( 1) ) 

EQUIVALENCE    (RN0(3) , CMEM( 2 ) , NEXT ( 1 ) ) 

DATA    0P/« (  '/,CP/'  )  V, BLANK/'     «/, COMMA/ • t '/ » ST AR/ «*•/ , 
lEQS/'='/,CX/'C'/,DX/'D»/,DOLS/'$'/ 

DATA    LX/'L'/tAX/'A'/tSX/'S'/tCOL/':'/! 
lAMP/'&'/tHYP/'-'/ 
C INITIALIZE    ARRAYStCERTAIN    COUNTERS    AND    SUBSCRIPTS. 

AVAIL1=1 

Q=AVAIL1 

AVAIL2=1 

S=AVAIL2 

R=  S 

CALL    INIT1 

CALL    INIT2 

TERM=0 

ER  =  0 

FT=0 

MI  =  1 

FNUM=0 

RNUM=0 

DO    20    I=lt50 

DO    21    J=1T3 

DRTAB( I »J)=BLANK 
21    CONTINUE 
2  0    CONTINUE 

DO    23    1=1,500 

MINDEX( I,7)=BLANK 

DO    24    J=l,4 

MINDEX( I,J)=BLANK 
24    CONTINUE 
23    CONTINUE 

DO    26    1=1,30 

FTABU  )  =  BLANK 
26  DESTAB( I )=BLANK 

C RESERVE  THE  FIRST  ROW  OF  THE  MASTER  INDEX  FOR'CLASS'. 

C MINDEX(1,8)  POINTS  TO  CELLS  WHICH  CONTAIN  FORMAT 

C NUMBERS, AND  POINTERS  TO  THE  FULL  CHARACTER  REPRESENTA- 

C TION  OF  THE  FORMAT  NAME  IN  SEQUENTIAL  STORAGE.  CELLS 

C ARE  ATTACHED  AS  THE  RECORD  FORMATS  ARE  PROCESSED. 

MINDEX(  1,1)=CX 

MINDEX( 1,2)=LX 

MINDEX( 1,3)=AX 

MINDEX( 1,4)=SX 

MINDEX( 1, 5)=0 

MINDEX( 1,6)=1 

MINDEX( 1,8)=0 

SEQ(l)  CX 

SEQ(2)- LX 

SEQ(3)=AX 

SEQ(4)=SX 

SEQ(5)=SX 

SEQ(6)=STAR 

SI=6 

44  DO  28  1=1,30 
28  SERCH( I )=BLANK 

45  J=l 
K=80 
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PCTR=0 

DO  27  1=1,241 

WORK( I )=BLANK 
27  ACUM( I )=BLANK 

IF(TERM.EQ.l.OR.ER.EQ.l)  GO  TO  399 

C READ  ONE  RECORD  INTO  THE  'WORK'  ARRAY  AND  DETERMINE 

C IF  THE  PARENTHESES  ARE  BALANCED  OR  IF  THE  RECORD 

C EXCEEDS  240  CHARACTERS. 

46  READ  <4,2,END=900)  ( WORK ( I ) , I = J , K) 
2  FORMAT  (80A1) 

GO  TO  41 
399  READ  (5,2)  (  WORKU  )  ,  I  =  J,K) 

IF  (WORM  1)  .EQ.DOLS.OR.W0RM2)  .  EO.  DOLS  .  OR.  WORK  (  3  ) 
l.EQ.DOLS)  GO  TO  1000 
41  DO  30  L  =  J,K 

IF(WORML)  .EQ.OPJ  PCTR  =  PCTR  +  1 

IF(WORML)  .EQ.CP)  PCTR  =  PCTR-1 

IF(W0RK(L+1) .EQ.STAR)  GO  TO  32 
30  CONTINUE 

J=J+80 

IF(K.GT.240)  GO  TO  950 

IF(TERM.EQ.l.OR.ER.EQ.l)  GO  TO  399 

GO  TO  46 
32  IF(PCTR)  925,40,925 

C DEBLANK  THE  RECORD  CONTAINED  IN  WORK,  LOAD  IT  INTO  THE 

C ACCUMULATOR,  AND  DETERMINE  ITS  LENGTH. 

40  ER=0 

J=0 

DO  47  1=1,241 

IF(WORK(I I. EQ. BLANK)  GO  TO  47 

J-J+l 

ACUM(J )=WORK(  I  ) 

IF(WORK( I ). EQ.STAR)  GO  TO  48 

47  CONTINUE 

48  N=J-1 

C DETERMINE  THE  RECORD  TYPE  AND  BRANCH  TO  THE 

C APPROPRIATE  BLOCK  OF  CODE  FOR  PROCESSING. 

C 

c 

CALL  IDENT(&600,ei800,&700) 
C 
C 

C THIS  BLOCK  OF  CODE  PROCESSES  INPUT  RECORDS, 

C I.E., FORMAT  DEF INITI ONS,CL ASS  DEFINITI ONS, AND  DATA 

C RECORDS.  FORMAT  OR  CLASS  TREES  ARE  STRUCTURED, ENTRI ES 

C MADE  IN  THE  MASTER  INDEX, DATA  RECORD  TABLE, AND 

C SEQUENTIAL  STORAGE. 

IF(ACUM(1) .EQ.EQS)  GO  TO  251 

DO  50  1=1, N 

MK=I 

IF(ACUM( I) .EQ.OP)  GO  TO  55 
50  SERCH( I )=ACUM( I) 

55  IF  ( ACUM(MK+1) .NE.CP)  GO  TO  56 
WRITE(6,280) 

280  FORMATdH  , 'INVALID  QUERY:  MISSING  HYPHEN') 
GO  TO  44 

56  NN=N 
N=MK-1 
CALL  MISRCH 
IF(ANS.EQ.O)  GO  TO  57 
FN0=MINDEX(SJ,5) 

GO  TO  200 

C THIS  RECORD  IS  A  FORMAT. 

C ADD  A  CELL  TO  MINDEX(1,8)  FOR  THIS  FORMAT. 

57  FNUM=FNUM+1 
IF(R.NE.S)  NEXT(R)=0 
CALL  GET2 
NEXT(R)=0 

IF(MINDEX( 1,8) .NE.O)  GO  TO  71 

MINDEX(  1,8)=R 

CM=R 
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GO  TO  73 

71  CM=MINDEX(1,8) 

72  IF(NEXT(CM).EQ.O)  GO  TO  78 
CM=NEXT(CM) 

GO  TO  72 

78  NEXT(CM)=R 
CM=NEXT(CM) 

73  RNO(CM)=FNUM 
SI=SI+1 
CMEM(CM)=SI 
MI  =  1 

74  MI=MI+1 

IF(MINDEX(MI ,1) .NE. BLANK)  GO  TO  74 

LK=MK-1 

DO  77  1=1, LK 

IFU.GT.4)  GO  TO  60 

MINDEX(MI,I )=ACUM( I ) 
60  SEQ(SI )=ACUM(I ) 

SI=SI+1 
77  CONTINUE 

79  SEQ( SI  )  =  STAR 
SI=SI+1 

C SET  A  POINTER  TO  THE  FORMAT  DEFINITION  STORED  IN  'SEQ' 

MINDEX(MI,7)=SI 

DO  82  L=MKtNN 

SEQ(SI )=ACUM(L) 

SI=SI+1 
82  CONTINUE 

SEQCSI )=STAR 
C SET  A  POINTER  TO  THE  FORMAT  NAME  STORED  IN' SEQ*. 

MINDEX(MI,6)=CMEM(CM) 

MINDEX(MI,5)=FNUM 
C INITIALIZE  A  HEADER  CELL  FOR  THE  FORMAT  TREE. 

CALL  GET1 

AVAIL 1=P 

MINDEX(MI,8)=P 

TOP(P)=MI 

C CHAIN  A  CELL  TO  THE  HEADER  FOR  THE  FIRST  CLASS  OF  THIS 

C FORMAT  DEFINITION. 

CALL  GET1 

RR=P 

DD  =  P 

DOWN(P)=0 

C DETERMINE  IF  EACH  CLASS  HAS  BEEN  PREVIOUSLY  DEFINED 

C EACH  CLASS  IS  UNIQUELY  DEF I  NED, THEREFORE , THERE  WILL  BE 

C NO  DUPLICATE  CLASS  ENTRIES  IN  THE  MASTER  INDEX. 

C IF  A  CLASS  HAS  BEEN  PREVIOUSLY  DEFINED,  ITS 

C DESCENDANTS  ARE  LOCATED  AND  ADDED  TO  THE  TREE. 

DI=0 

DJ  =  0 
85  DO  80  IK=1,30 

SERCHC IK)=BLANK 

80  FRST4( IK)=BLANK 

81  MK=MK+1 
IF(ACUM(MK).EQ.STAR)  GO  TO  170 

IF(ACUM(MK).EQ. COMMA. OR. ACUM(MK) .EQ.OP)  GO  TO  81 
I  =  MK 

J=l 
90  SERCH( J)=ACUM( I) 

IFU.GT.4)  GO  TO  95 

FRST4( J)=ACUM( I ) 
95  1=1+1 

IFCACUMU  )  .EQ. COMMA. OR. ACUM<  I)  .EQ.CP)  GO  TO  100 

J  =  J+1 

GO  TO  90 
100  N=J 

CALL  MISRCH 

JMANS.EQ.  1)  GO  TO  150 

r  =i 

105  Mi=MI+l 

IF(MINDEX(MI,  D.NE. BLANK)  GO  TO  105 
DO  110  JK=1,4 
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110  MINDEX(MI , JK)=FRST4( JK) 
IFU.GT.4)  GO  TO  120 


MINDEX(MI ,6)=0 

GO  TO  130 
120  SI=SI-H 
T   MINDEX(MIT6)=SI 

DO  125  K=1TJ 

SEQ( SI )=SERCH(K) 
125  SI=SI+1 

SEQ( SI )=STAR 

130  MINDEX(MI T5)=CX 
IF(TOP(RR) .EQ.O)  GO  TO  131 
CALL  GET1 

RIGHT(RR)=P 
RR=RIGHT(RR) 

131  TOP(RR)=MI 
MINDEX(MI ,8)=RR 
DOWN(RR)=0 

C PROCESS  THE  NEXT  CLASS  IN  THE  FORMAT  DEFINITION. 

C IF  NONE  THEN  READ  IN  THE  NEXT  RECORD. 

MK=I 

GO  TO  85 

150  IF(MINDEX(SJ,5).EQ.LX)  GO  TO  151 
DI=DI+1 

DESTAB(DI )=SJ 
MI  =  SJ 

151  IF(TOP(RR).EQ.O)  GO  TO  155 
CALL  GET1 

RIGHT(RR)=P 
RR=RIGHT(RR) 
155  TOP(RR)=SJ 

MINDEX(SJ,8)=RR 
DOWN(RR)=0 
MK=I 

GO  TO  85 
C ADD  THE  PREVIOUSLY  DEFINED  CLASSES  TO  THE  TREE. 

170  MK=0 

IF(DESTAB(1).NE. BLANK)  GO  TO  175 

171  DO  173  1=1,30 
DESTABU  )=BLANK 

173  CONTINUE 

GO  TO  45 
175  DJ=DJ+1 

IF(DESTAB(DJ).EQ. BLANK)  GO  TO  171 
180  L=MINDEX(DESTA6(DJ) ,7) 

DO  193  J=l,241 
193  ACUM( J)=BLANK 

J=l 
197  ACUM( J)=SEQ(L) 

L=L  +  1 

J  =  J+1 

IF(SEQ(L).NE.STAR)  GO  TO  197 

ACUM( J)=SEQ(L) 

RR=MINDEX(DESTAB(DJ) ,8) 

CALL  GET1 

DOWN(RR)=P 

RR=DOWN(RR) 

GO  TO  85 

C THIS  BLOCK  OF  CODE  PROCESSES  CLASS  DEFINITIONS 

200  IF(MINDEX(SJ,5).NE.CX.AND.MINDEX(SJ,5) .NE.LX)  GO  TO  25 

SI=SI+1 

MINDEX( SJ,7)=SI 

DO  21.5  L  =  MKtNN 

SEQ( SI  )=ACUM(L) 
215  SI=SI+1 

SEQ( SI )=STAR 

RR=MINDEX(SJ,8) 

CALL  GET1 

DOWN(RR)=P 

RR=DOWN(RR) 

DOWN(RR)=0 

GO  TO  85 
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C THE  ACCUMULATOR  CONTAINS  A  DATA  RECORD. TRAVERSE  THE 

C TREE  AND  LOCATE  ALL  END  NODES  OF  ALL  BRANCHES  .DATA 

C ELEMENTS  ARE  MAPPED  ONTO  THEIR  RESPECTIVE  CLASSES. 

C IF  THE  FIRST  CHARACTER  OF  THE  RECORD  IS  AN  ASTERISK 

C THEN  THE  RECORD  IS  A  DATA  RECORD  AND  A  MEMBER  OF  THE 

C SAME  FORMAT  AS  THE  LAST  DATA  RECORD  PROCESSED. 

250  CALL  TRAV 

251  NI=1 
NN=J-1 
RNUM=RNUM+1 

C STORE  THE  RECORD  IN  SEQUENTIAL  STORAGE  AND  MAKE  DATA 

C RECORD  TABLE  ENTRIES. 

SI=SI+1 

DRTAB(RNUM,1)=RNUM 

DRTAB(RNUM,2)=FN0 

DRTAB(RNUM,3)=SI 

IF(ACUMd).EQ.EQS)  MK=2 

DO  260  I=MK,NN 

SEQ(SI  )=ACUMU  ) 
260    SI=SI+1 

SEQ(SI )=STAR 
C PROCESS    EACH    DATA    ELEMENT    IN    THE    RECORD. 

SET=0 

CALL  CHAR4(ACUM,FRST4,NN) 

C DETERMINE  IF  A  MULTIPLE  ENTRY  EXISTSt 

C E.G., ((JOHN  DOE), (ROBERT  SMITH)) 

MK=MK+1 
255  IF(ACUM(MK) .EQ.OP)  CALL  MENT 

IF(MULT.EQ.l)  MM=NI 
265  DO  267  1=1,30 
267  SERCH( I )=BLANK 

J=l 

270  SERCH( J)=ACUM(MK) 
J  =  J+1 

MK=MK+1 

271  IF(ACUM(MK).EQ.STAR)  GO  TO  44 

IF(ACUM(MK). EQ. COMMA. OR. ACUM(MK) .EQ.CP)  GO  TO  275 

GO  TO  270 

C DETERMINE  IF  THE  DATA  ELEMENT  HAS  BEEN  PREVIOUSLY 

C ENTERED  IN  THE  MASTER  INDEX.  IF  NOT  MAKE  ENTRIES  IN 

C THE  MASTER  INDEX , SEQUENTI AL  STORE, AND  INITIALIZE  THE 

C MEMBERSHIP  CELLS. 

275  N=J-1 

CALL  MISRCH 

IF(ANS.EQ.O)  GO  TO  535 

SET=SET+4 

DO  455  JJ=1,30 
455  SERCH( JJ)=BLANK 

CM=MINDEX(SJ,8) 
460  IF(NEXT(CM).EQ.O)  GO  TO  46  5 

CM=NEXT(CM) 

GO  TO  460 
465  NEXT(R)=0 

CALL  GET2 

NEXT(CM)=R 

RNO(R)=RNUM 

CMEM(R)=NODTAB(NI ) 

SJ=NODTAB(NI ) 

NI=NI+1 

IF(NODTAB(NI ) .NE. BLANK)  GO  TO  480 

GO  TO  45 
480  IF(MINDEX(SJ,7) .EQ. BLANK)  GO  TO  493 

CM=MINDEX(SJ,7) 
485  IF(NEXT(CM).EQ.O)  GO  TO  490 

CM=NEXT(CM) 

GO  TO  485 
490  NEXT(R)=0 

CALL  GET2 

NEXT(CM)=R 

GO  TO  495 
493  NEXT(R)=0 

CALL  GET2 
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MINDEX( SJ,7)=R 
495  RNO(R)=RNUM 

CMEM(R)=FNO 
492  IF(ACUM(MK).NE. COMMA)  GO  TO  498 

497  MK=MK+1 

IFCACUM(MK) .EQ.OP)  GO  TO  255 
GO  TO  265 

498  MK=MK+1 
IF(ACUM(MK).EQ.STAR)  GO  TO  44 

IF(ACUM<MK) .EQ. COMMA. OR. ACUM(MK) .EQ.CP)  GO  TO  498 

IF(ACUM(MK) .EQ.OP)  GO  TO  500 

GO  TO  265 
500  IF(MULT.NE.l)  GO  TO  255 

MULT=0 

NI=MM 
505    MK=MK+1 

IF(ACUM(MK).EQ.STAR)     GO    TO    44 

IF(ACUM(MK) .EQ.OP)     GO    TO    505 

GO  TO  265 
535  DO  540  M=lt4 

SET=SET+1 
540  MINDEX(SJ,M)=FRST4(SET) 

IFU.LE.5)  GO  TO  555 

SI=SI+1 

MINDEX(SJ,6)=SI 

LL=1 
545  SEQ(SI )=SERCH(LL) 

LL=LL+1 

IF(LL.EQ.J)  GO  TO  550 

SI=SI+1 

GO  TO  545 
550  SI=SI+1 

SEQ( SI )=STAR 
555  IF(J.LE.5)  MINDEX( S J , 6 ) =0 

MINDEX(SJ,5)=DX 

NEXT(R)=0 

CALL  GET2 

M1NDEX( SJ,8)=R 

RNO(R)=RNUM 

CMEM(R)=NODTAB(NI ) 

L=NODTAB(NI) 

NI=NI+1 

IF(MINDEX(L,7> .EQ. BLANK)  GO  TO  580 

CM=MINDEX(L,7) 
565  IF(NEXT(CM) .EQ.O)  GO  TO  570 

CM=NEXT(CM) 

GO  TO  565 
570  NEXT(R)=0 

CALL  GET2 

NEXT(CM)=R 

RNO(R)=RNUM 

CMEM(R)=FNO 

GO  TO  492 
580  NEXT(R)=0 

CALL  GET2 

MINDEX(L,7)=R 

RNO(R)=RNUM 

CMEM(R)=FNO 

GO  TO  492 

C THIS  BLOCK  OF  CODE  PROCESSES  FORMAT  DEFI NIT  ION, CLASS 

C DEFINITION, AND  FORMAT  NAME  QUERIES. 

600  OH  605  1=1 ,N 
605  SERCHt I )=ACUM( I ) 

CALL  MISRCH 

IF(ANS.EQ.l)  GO  TO  615 
607  WRITE(6,610)  { SERCH ( I ) , 1=1 , 30) 
610  FORMATdH  , 'REQUEST  NOT  FULFILLED:', 
1   /,T2,30A1, 'WAS  NOT  FOUND') 

GO  TO  44 
615  IF(SJ.NE.l)  GO  TO  645 
C OUTPUT  ALL  FORMAT  NAMES. 

CM=MINDEX( 1,8) 
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618  ST=CMEM(CM) 

SE  =  ST 
620  IF(SEQ(SE+1) .EQ.STAR)  GO  TO  625 

SE=SE+1 

GO  TO  620 
625  WRITE  (6,630)  RNO( CM ) , ( SEQ( I ) , I =ST , SE) 
630  FORMATdH  , 'FORMAT  NUMBER •  , I  3, 2X  ,30A1 ) 

IF(NEXT(CM).EQ.O)  GO  TO  635 

CM=NEXT(CM) 

GO  TO  618 
635  WRITE  (6,640) 
640  FORMAT  ( /T2 , • REQUE ST  COMPLETE') 

FT  =  0 

GO  TO  44 

645  IF(MINDEX(SJ,5) .NE.DX)  GO  TO  650 

646  WRITE(6t647)  ( SERCH( I ) , I =1 , 30) 

647  FORMATdH  ,'INVALID  QUERY:', 
1/,T2,30A1, 'IS  A  DATA  ELEMENT') 

GO  TO  44 
65  0  IF(MINDEX(SJ,5) .NE.CX.AND. MINDEX (S J , 5) . NE. LX ) 
1    GO  TO  670 

IF(MINDEX(SJ,5).NE.LX)  GO  TO  652 

WRITE(6,653)     ( SERCH( I ) , 1=1 , 30) 
653    FORMATdH    ,30A1,'HAS    NO    DESCENDANTS') 

GO    TO    44 
652    ST=MINDEX(SJ,7) 

SE=ST 
655    IF(SEQ(SE+1) .EQ.STAR)     GO    TO    660 

SE  =  SEd 

GO  TO  655 
660  WRITE(6,665)  ( ACUM ( I  )  , 1  =  1 , N) , ( SEQ( I  X) , I X=ST, SE) 
665  FORMATdH  ,  30A1 ,  3(  80A1 ,  /  )  ) 

GO  TO  44 
670  DO  673  1=1, N 
673  SERCH(  I  )=ACUM( I ) 

CALL  MISRCH 

IF(ANS.NE.l)  GO  TO  607 

GO  TO  652 

C IF  THE  KEYWORD  SPECIFIED  IN  THE  QUERY  IS  A  FORMAT  NAME 

C TH^N  OUTPUT  ALL  DATA  RECORDS  WHICH  ARE  MEMBERS  OF  THAT 

C FOr.MAT.IF  THE  KEYWORD  IS  A  CLASS  THEN  OUTPUT  ALL  DATA 

C ELL-i-.ENTS  WHICH  ARE  MEMBERS  OF  THAT  CLASS. 

700  DO  705  1=1,30 

IF(ACUM(I ) .EQ.OP)  GO  TO  707 

N=I 

SERCH(  I)=ACUM( I) 
705  CONTINUE 
707  CALL  MISRCH 

IF( ANS.EQ. 1)  GO  TO  709 

WRITE(6,610)  (SERCHt I ),I=1,30) 

GO  TO  44 

709  IF(MINDEX( SJ,5) .EQ.DX)  GO  TO  646 

710  IF(MINDEX(SJ,5).EQ.CX.0R.MINDEX(SJ,5).EQ.LX)  GO  TO  750 
720  J=0 

725  J=J+1 

IF(DRTAB( J, 2). EQ. BLANK)  GO  TO  635 

IF(DRTAB( J, 2) . NE . M INDE X( S J , 5 ) )  GO  TO  725 

ST=DRTAB( J, 3) 

SE  =  ST 
730  IF(SEQ(SE+1) .EQ.STAR)  GO  TO  733 

SE=SE+1 

GO  TO  730 
733  WRITE  (6,735)  ( SEQ( I ) , I=ST , SE ) 
735  FOR  AT  ( 2( /, T2 , 120A1 ) ) 

GO  TO  725 
750  IF(MINDEX(SJ,5).EQ.LX)  GO  TO  760 

WRITE(6,753)  ( SERCH( I ) , I =1 , 30) 
753  FORMATdH  , 'INVALID  QUERY:', 

1/,T2, 'DETERMINE  DESCENDANTS  OF:   «,30A1, 
2/,T2,«USE  DESCENDANTS  AS  KEYWORDS') 

GO  TO  44 
760  MI=1 
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765  MI=M I + 1 

IF(MINDEX(MI t 1) .EQ. BLANK)  GO  TO  635 
IF(MINDEX(MI,5) .NE.DX)  GO  TO  765 
CM=MINDEX(MI ,8) 

770  IF(CMEM(CM).EQ.SJ)  GO  TO  771 

773  IF(NEXT(CM).EQ.O)  GO  TO  765 
CM=NEXT(CM) 

GO  TO  770 

C IF  FT=1  THEN  OUTPUT  ONLY  THOSE  DATA  ELEMENTS  WHICH 

C ARE  MEMBERS  OF  THE  CLASS  AND  FORMAT  SPECIFIED 

C IN  THE  QUERY. 

771  IF(FT.EQ.O)  GO  TO  774 
IF(DRTAB(RNO(CM) t2 ).NE.FNO)  GO  TO  773 

774  IF(MINDEX(MI ,6) .EQ.O)  GO  TO  785 
ST=MINDEX(MI,6) 

SE=ST 
778  IF(SEQ(SE+1) .EQ.STAR)  GO  TO  780 

SE=SE+1 

GO  TO  778 
780  WRITE  (6,783)  ( SEQ( I ) , I=ST, SE) 
783  FORMAT  (/,T2,30A1) 

GO  TO  765 
785  WRITE  (6,788)  ( M INDEX( MI , I ) , 1= 1 , 4) 
788  FORMAT  (/,T2,4A1) 

GO  TO  765 

C THIS  BLOCK  OF  CODE  PROCESSES  HYPHEN  A^ D 

C BOOLEAN'AND'  REQUESTS. 

800  DO  801  1=1,100 
RECTABU  )=BLANK 

801  CONTINUE 
RI=0 
RSMK=1 

DO  805  1=1,30 

IF(ACUMd)  .EQ.OP)  GO  TO  810 

N=I 

SERCH( I)=ACUM( I) 
805  CONTINUE 
810  CALL  MISRCH 

IF(ANS.EQ.O)  GO  TO  607 

IF(MINDEX( SJ,5).NE.CX.AND.MINDEX(SJ,5) .NE.LX) 
1    GO  TO  820 

WRITE(6,815)  (SERCH( I ),I=1,30) 
815  FORMATdH  , 'INVALID  QUERY:', 

1/,T2,30A1, 'IS  NOT  A  FORMAT  NAME') 

GO  TO  44 

820  IF(MINDEX(SJ,5) .EQ.DX)  GO  TO  646 
FN0=MINDEX(SJ,5) 

CALL  TRAV 

DO  821  1=1,50 

IF(NODTAB( I) .EQ. BLANK)  GO  TO  823 

NODE=I 

821  CONTINUE 

823  CLAS=0 

DO  824  1=1,241 

IF( ACUM( I ).EQ. COMMA)  CLAS=CLAS+1 

IF(ACUM( I  J .EQ.STAR)  GO  TO  826 

824  CONTINUE 

826  IF(CLAS.LT.NODE)  GO  TO  827 
WRITE(6,828) 

828  FORMATdH  , 'INVALID  QUERY:  NUMBER  OF  KEYWORD  ', 
1/,T2, 'POSITIONS  EXCEEDS  THE  NUMBER  OF  CLASSES  ', 
2/, T2, 'CONTAINED  IN  THE  SPECIFIED  FORMAT') 
GO  TO  44 

827  CALL  QSCANU822,£855,&870) 
C CMNO  IS  CLASS  MEMBERSHIP  NO. 

822  CMNO=NODTAB(HCTR) 
DI=0 

CALL  MISRCH 
IF(ANS.EQ.l)  GO  TO  831 
WRITE(6,825)  ( SERCH( I ) , 1= 1 , 30) 

825  FORMATdH  ,  30A1  ,  2X  ,  •  WAS  NOT  FOUND:', 

1/,T2, 'RECORDS  SATISFYING  OTHER  KEYWORDS, IF  ANY,', 
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2  'ARE    LISTED' ) 
AT  =  AE 

CALL    QSCANH  £822,£855,£870) 
8  30    IF (RECTABCRI ) . EQ. BLANK. OR. RECT AB (RI J.EQ.STAR) 
1       GO    TO    893 

RI=RI+1 

RECTAB(RI  )=STAR 

RSMK=RI 

AT  =  AE 

CALL    QSCAN1(£822,£855,£870) 
831     IF(MINDEX(SJ,5) .EQ.CX)    GO    TO    834 

IF(MINDEX(SJ,5) .EQ.DX)     GO    TO    835 

FT=1 

GO    TO    760 

834  WRITE(6,753)  ( SERCH( I ) , I =1 , 30) 
GO  TO  44 

835  CM=MINDEX(SJ,8) 

840  IF(CMEM(CM).EQ.CMNO)  GO  TO  842 

841  IF(NEXT(CM).EQ.O.AND.DI.EQ.O)  GO  TO  847 
IF(NEXT(CM).EQ.O)  GO  TO  843 
CM=NEXT(CM) 

GO  TO  840 

842  DI=RNO(CM) 

IF(DRTAB(DIt2) .EQ.FNO)  GO  TO  850 
GO  TO  841 

847  WRITE(6,848)  (  SERCHd  )  ,  1  =  1 ,  30) 

848  FORMATdH  ,30A1,'WAS  FOUND  BUT  IS  NOT  A  MEMBER  OF', 
1/,T2,'THE  CLASS  SPECIFIED  IN  THE  QUERY:', 

2/, T2, 'RECORDS  SATISFYING  OTHER  KEYWORDS, IF  ANY,', 

3  'ARE  LISTED' ) 
AT  =  AE 

CALL  QSCANK  £822, £855,  £870) 

843  WRITE(6,851)  ( SERCH( I )  ,  1  =  1 , 30) 

851  FORMATdH  ,30A1,«WAS  FOUND  BUT  IS  NOT  A  MEMBER  OF', 
1/,T2,«THE  FORMAT  SPECIFIED  IN  THE  QUERY:', 
2/,T2,'IT  IS  A  MEMBER  OF:') 

1  =  0 

CM=MINDEX(SJ,8) 
310  1=1+1 

FT ABC  I  )=DRTAB(RNO(CM) ,2) 
315  IF(NEXT(CM).EQ.O)  GO  TO  325 

CM=NEXT<CM) 

DO  320  FI=1,30 

IF(FTAB(FI ).EQ. BLANK)  GO  TO  310 

IF(FTAB(FI).EQ.DRTAB(RNO(CM) ,2) )  GO  TO  315 
320  CONTINUE 
325  CM=MINDEX( 1,8) 
327  DO  330  1=1,30 

IF(FTAB(I ).EQ. BLANK)  GO  TO  333 

IF(FTAB( I ) .EQ.RNO(CM) )  GO  TO  335 
330  CONTINUE 
333  IF(NEXT(CM).EQ.O)  GO  TO  355 

CM=NEXT(CM) 

GO  TO  327 
335  ST=CMEM(CM) 

SE  =  ST 
340  IF(SEQ(SE+1) .EQ.STAR)  GO  TO  345 

SE=SE+1 

GO  TO  340 
345  WRITE(6,350)  ( SEQ( IX ) ,  IX  =  ST, SE ) 
350  FORMATdH  ,T2,30A1) 

GO  TO  333 
355  DO  357  1=1,30 
357  FTAB( I  )  =  BLANK 

WRITE(6,360) 
360  FORMATdH  ,T2, 'RECORDS  SATISFYING  OTHER 
1/,T2,'IF  ANY, ARE  LISTED') 

AT=AE 

CALL  QSCAN1(£822,£855,£870) 
850  DO  852  RK=RSMK,100 

IF(RECTAB(RK).EQ. BLANK)  GO  TO  853 

IF(RECTAB(RK) .EQ.RNO(CM) )  GO  TO  854 
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852 
853 

854 


362 

855 
857 


858 

] 

■ 
861 

859 

863 

i 
866 

] 
860 

871 
862 
873 

876 
865 


864 
856 
867 


868 
872 


CONT 

RI=R 

RECT 

IF(S 

RI  =  R 

RECT 

RSMK 

AT  =  A 

CALL 

THIS 

NUME 

CMNO 

SJ  =  2 

IF(M 

1  =  1 

IF(M 

J=MI 

IF(S 

GO 

IF( 

GO 

GO 

IF( 

IF( 

GO 

1  =  1  + 

J  =  J  + 

IF(S 

GO  T 

1=1  + 

J  =  J  + 

IF(S 

GO 
GO  T 
1  =  1  + 
J=J  + 
IF(S 

GO 
GO  T 
IF(M 
.EQ. 
IF(M 
.LE. 
GO  T 
IF(M 
IF(M 
GO  T 
1  =  1  + 
IF(S 
GO  T 
1  =  1  + 
IF(S 
G 
GO  T 
1  =  1  + 
IF(S 
G 
SJ  =  S 
IF(M 
S1  =  0 
GO  T 
CM=M 
IF(C 
IF(N 
CM=N 
GO  T 
DI  =  R 
IF(D 
GO  T 
DO  8 
IF(R 
IF(R 


INUE 
1  +  1 

AB(RI )=RNO(CM) 

2.EQ.1)  GO  TO  362 

1  +  1 

AB(RI )=STAR 

=  RI 

E 

QSCANK  £822,8855, £870) 

BLOCK  OF  CODE  PROCESSES  ALPHABETIC  AND/OR 
RIC  RANGE  REQUESTS. 
=NODTAB(HCTR) 


INDEX( SJ,5) .NE.DX)  GO 

INDEX(SJt6) .EQ.O)  GO  T 

NDEX( SJ,6) 

EQ(  J)  .EQ.SERCHd  ).  AND. 

0  859 

EQ( J) .GE.SERCH( I ) .AND. 

0  861 

0  865 

EQ(  J)  .EQ.SERCHd  )  )  GO 

EQ( J).EQ.UPRAN(I ) )  GO 

0  864 

1 

1 

ERCH( I ).EQ. BLANK)  GO  T 

0  858 

1 

1 

ERCH( I ). EQ.BLANK.OR.SE 

TO  864 
0  865 
1 
1 
ERCHd  ).  EQ.BLANK.OR.SE 

TO  864 
0  865 

INDLX(SJ,I  )  .EQ.SERCHd 
UPRAN( I)  )  GO  TO  862 
INDEX(  SJ,I  )  .GE.SERCHd 
UPRAN( I) )  GO  TO  871 
0  865 

INDEX( SJ,I ).EQ.SERCH( I 
INDEX(  SJ,I  )  .EQ.UPRANd 
0  864 
1 

ERCH( I ).EQ. BLANK)  GO  T 
0  860 
1 

ERCHd  ).EQ.BLANK.OR.MI 
0  TO  864 
0  865 
1 

ERCHd  ).EO.BLANK.OR.MI 
0  TO  864 
J+l 
INDEX(SJtI) .NE. BLANK) 


TO  865 

0  860 

SEQ(  J)  .EQ.UPRANd  )  ) 

SEQ( J) .LE.UPRAN( I )) 


TO  863 
TO  866 


0  864 


Q(  J)  .GE.SERCHd  )  ) 


Q(  J).LE.UPRAN(I ) ) 

) .AND.MINDEX(SJ, I ) 
) .AND.MINDEX(SJ, I ) 


)  )  GO  TO  873 
) )  GO  TO  876 


0  864 


NDEX(  SJ»  I  ). GE.SERCHd)) 


NDEX( SJ, I ).LE.UPRAN(I ) ) 


GO  TO  857 


0  830 
INDEX(SJ,8) 

MEM(CM).EQ.CMNO)  GO 
EQ.O)  GO  TO 


TO 
861 


868 


EXT(CM) 

EXT(CM) 

0  856 

NO(CM) 

RTAB(DI 

0  8C7 

83  F-K=RSMK,  100 

ECTAB(RK).EQ. BLANK)  GO 

ECTAB(RK) .EQ.RNO(CM) ) 


2) .EQ.FNO)  GO  TO  872 


TO  869 
GO  TO  865 
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883  CONTINUE 

869  RI=RI+1 
RECTAB(RI )=RNO(CM) 
GO  TO  865 

C RECORD  NUMBERS  WHICH  SATISFY  INDIVIDUAL  QUERY  KEYWORDS 

C ARE  STOREO  IN  A  LIST  ( 'R ECTAB « ) .THI S  BLOCK  OF  CODE 

C SEARCHES  'RECTAB'TO  DETERMINE  WHICH  RECORDS  SATISFY 

C ALL  QUERY  KEYWORDS. 

870  IF(RI.EQ.O)  GO  TO  893 
IF(RECTAB(RI ) .EQ.STAR)  GO  TO  887 
RI=RI+1 

RECTAB(RI )=STAR 
887  RECTAB(RI+1)=D0LS 
W=0 
RI  =  1 

TRNO=RECTAB(RI) 
RS=1 

880  RS=RS+1 
IF(RECTAB(RS).NE.STAR)  GO  TO  880 
RMK=RS 

881  RS=RS+1 

882  IF(RECTAB(RS).EQ.DOLS)  GO  TO  895 
885  IF(RECTAB(RS).EQ.TRNO)  GO  TO  890 

RS=RS+1 

IF(RECTAB(RS).NE.STAR)  GO  TO  885 

IF(W.EQ.O)  GO  TO  893 

GO  TO  635 
890  RS=RS+1 

W=l 

IF(RECTAB(RS). EQ.STAR)  GO  TO  881 

GO  TO  890 
893  WRITE(o,892) 

892  FORMATUH  ,'REQUEST  NOT  FULFILLED:  ' 
1   'NO  RECORDS  SATISFY  THE  QUERY') 

GO  TO  44 

895  ST=DRTAB(TRN0,3) 
SE  =  ST 

896  IF(SEQ(SE+1) .EQ.STAR)    GO    TO    898 
SE=SE+1 

GO    TO    896 

898  WRITE    (6,899)     ( SEQ ( I ) , I=ST , SE ) 

899  FORMAT     ( / , T2 , 2 ( / ,T2, 120A1 ) ) 
RI=RI+1 

IF(RI.EQ.RMK)  GO  TO  635 
TRNO=RECTAB(RI ) 
RS=RMK+1 
GO  TO  882 

900  WRITE  (6,10) 

10  F0RMAT(1H0,T40, ■*  *  *  FLY  NAVY  *  *  *') 
TERM=1 
NEXT(R)=0 
GO  TO  45 
925  WRITE(6,6) 

6  FORMATUH  , 'ERROR:  UNBALANCED  PARENTHESIS') 
WRITE(6,7)  (WORKU  )  ,1  =  1,240) 

7  F0RMAT(T2,2(/,T2,120A1) ) 
IF(TERM.EQ.l)  GO  TO  45 
ER=1 

GO  TO  45 
950  WRITE(6,8) 

8  FORMATdH  , 'ERROR:  RECORD  LENGTH  EXCEEDS  240  CHARACTER 
WRITE(6,7)  (WORKU  )  ,1  =  1,240) 

IF(TERM.EQ.l)  GO  TO  45 
READ(4,2)  (WORM  I  )  ,1  =  1,80) 
WRITE(6,2) 
ER=1 

GO  TO  45 
1000  WRITE(6,999) 
999  F0RMAT(T2, 'PROGRAM  TERMINATION') 
STOP 
END 
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SUBROUTINE  CHAR4( ARR, F4, L ) 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  ARR(241) ,F4( 241) 

DATA  OP/' (  V,CP/' )  V, BLANK/'  • / ,COMMA/ • , • / 

C THIS  SUBROUTINE  STORES  THE  FIRST  FOUR  CHARACTERS  OF 

C EACH  DATA  ELEMENT  IN  THE  ARRAY  • FRST4' .DUR I NG  DATA 

C ELEMENT  PROCESSING  EACH  CHARACTER  BLOCK  IS  MOVED  INTO 

C THE  MASTER  INDEXTIF  NOT  PREVIOUSLY  ENTERED. 

DO  5  1=1,241 
5  F4( I )=BLANK 
1  =  0 
J=l 

CCTR=0 
7  1=1+1 

IF(ARRU).NE.OP)  GO  TO  7 
10  1=1+1 

IF(I .GE.L)  GO  TO  60 

IF(ARR( I ) . EQ.OP.OR.ARR( I ). EQ.C P .OR. ARR ( I ) . EQ. COMMA) 
1G0  TO  20 
25  F4(J)=ARR(I) 
J  =  J  +  1 

CCTR=CCTR+1 
IF(CCTR-4)  10,30,30 
20  IF( ARR( I) .EQ.OP.OR.CCTR.EQ.O)  GO  TO  10 
35  F4(J)=BLANK 
J  =  J  +  1 

CCTR=CCTR+1 
IF(CCTR-4)  35,30,30 
30  CCTR=0 

IF(ARR( I ) .NE. COMMA)  GO  TO  40 
1=1  +  1 

IF(ARR( I) .NE.OP)  GO  TO  25 
40  1=1+1 

I FC I .GE.L)  GO  TO  60 

IF (ARR (I ) .NE.OP.AND.ARRt I) . NE.C P. AND . ARR ( I ) .NE. COMMA) 
1G0  TO  40 
45  1=1+1 

IF( I .GE.L)  GO  TO  60 

IF(ARR(  D.EQ.OP.OR.ARRU  ) .  EQ.C  P  .OR.  ARR  ( I  )  .  EQ.  COMMA  ) 
1G0  TO  45 
GO  TO  25 
60  RETURN 
END 

SUBROUTINE  INIT1 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  TOP (603) , RIGHT (602) , DOWN (601) 

COMMON /ONE/TOP, AVA IL 1 , P, Q, RR , DD , YY 

EQUIVALENCE  (TOP (3) , RIGHT( 2 ) ,DOWN( 1) ) 

C --THIS  SUBROUTINE  INITIALIZES  CELLS  USED  IN  TREE 

C STRUCTURES. 

DO  10  1=1,601,3 

TOP( I)=0 
10  RIGHT( I)=0 

DO  20  1=1,598,3 
20  DOWN( I  )  =  I+3 

DOWN(601)=0 

RETURN 

END 

SUBROUTINE  GET1 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  TOP (603) , RIGHT (602) , DOWN (601 ) 

COMMON /ONE /TOP, AVA I L 1 , P, Q, RR , DD, YY 

EQUIVALENCE  (TOP (3) ,RIGHT(2) ,DOWN( 1) ) 

P=Q 
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Q=DOWN(Q) 

RETURN 

END 


10 
20 


(A-Z) 
,  CMEMC602 ) 


SUBROUTINE  INIT2 
IMPLICIT  INTEGER*2 

DIMENSION  RN0(603)  , CME MC 602 ) , NEXTC 601 ) 
C0MM0N/TW0/RN0,AVAIL2,S,R,CM,RM 
EQUIVALENCE  ( RNO ( 3 ) , CMEM( 2 ) , NEXT ( 1 ) ) 
•THIS  SUBROUTINE  INITIALIZES  CLASS  MEMBERSHIP 
DO  10  1=1,601,3 
RNO( I)=0 
CMEM( I )=0 
DO  20  1=1,598,3 
NEXT( I  )  =  I+3 
NEXT(601)=0 
RETURN 
END 


CELLS 


SUBROUTINE  GET2 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  RNO (603) ,CMEM(602) ,NEXT(601) 

C0MM0N/TW0/RN0,AVAIL2,S,R,CM,RM 

EQUIVALENCE  (RNO (3 ) , CMEM( 2 ),NEXT(1) ) 

R=  S 

S=NEXTCSI 

RETURN 

END 


C- 

C- 


10 


15 
16 

17 

20 


55 


90 

100 
200 


SUBROU 
IMPLIC 
DIMENS 
DIMENS 
COMMON 
DATA  B 
■THIS  S 
WORD  C 
ANS  =  0 
SJ  =  0 
SJ=SJ+ 
SK=1 
1=1 

IFCMIN 
SJ=SJ+ 
IFCMIN 
GO  TO 
IFCN-4 
DO  17 
IFCMIN 
CONTIN 
GO  TO 
1  =  1  +  1 
KK=MIN 
IF(SEQ 
KK=KK+ 
IF(SEQ 
1=1  +  1 
IF(SEQ 
GO  TO 
1=1  +  1 
IF(SER 
ANS=1 
RETURN 
END 


TINE  MISRCH 

IT  INTEGER*2  (A-Z) 

ION  ACUMC2'+1) 

ION  SERCH( 30) , MI NDEX { 500 , 8 ) , SEQ( 4000) 

/THREE/ANS,SERCH,MINDEX, ACUM , S J , SK , N, SEQ 

LANK/'  •/fSTAR/1*1/ 

UBROUTINE  SEARCHES  THE  MASTER  INDEX  FOR  THE 

ONTAINED  IN  THE  ARRAY ■ SERCH' . 


DEX( SJ,SK) .EQ.SERCHt I) )  GO  TO  15 

1 

DEX( SJ,SK) .EQ. BLANK)  GO  TO  200 

10 

)  16,16,20 

J  =  2,4 

DEX(SJ,J).NE.SERCH(J) )  GO  TO  5 

UE 

100 

DEX(SJ,6)+1 

IKK) .NE.SERCHl I ) )  GO  TO  5 

1 

(KK) .EQ.STAR)  GO  TO  90 

(KK)  .EQ.SERCHU  )  )  GO  TO  55 
5 

CHC I ).NE. BLANK)  GO  TO  200 
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SUBROUTINE 
IMPLICIT  I 
DIMENSION 
COMMON/THR 
DATA  OP/' ( 

C THIS  SUBRO 

C CONTAINS  A 

C THE  APPROP 

C FURTHER  TE 

F1=0 

F2  =  0 

F3  =  0 

F4=0 

DO  10  1=1, 

IF(ACUM( I) 

IF(ACUMd) 

IF(ACUMU) 

IF(ACUM( I ) 

10  CONTINUE 

11  IF(F1.EQ.0 
IF(F2.E0.0 
IF(F3.EQ.l 
J=l 

15  IF(ACUM(J) 

J  =  J  +  1 

GO  TO  15 
17  DO  20  I=J, 

IF(ACUM(I ) 
1G0  TO  20 

IF(ACUM( I ) 

20  CONTINUE 

21  GO  TO  60 
30  RETURN 
40  RETURN  1 
50  RETURN  2 
60  RETURN  3 

END 


IDENT(*,*t*) 
NTEGER*2  (A-Z) 

SERCHC30) ,MINDEX( 500, 8 ) T SEQ( 4000 ) ,ACUM( 241) 
EE/ANStSERCHtMINDEXt ACUM,SJT SK,N,SEQ 
'/,CP/' ) »/»COMMA/» ,  • /, HYP /•-•/, AMP /'e'/tCOL/ 
UTINE  DETERMINES  WHETHER  THE  ACCUMULATOR 
N  INPUT  RECORD  OR  A  QUERY  AND  RETURNS  TO 
RIATE  CODE  BLOCK  IN  THE  M/PROG  FOR 
STING  AND  PROCESSING. 


N 

.EQ.OP)  Fl=l 
.EQ.HYP)  F2=l 
.EQ.COL)  F3=l 
.EQ.AMP)  F4=l 

)  GO  TO  40 

.AND. F3.EQ.0.AND.F4.EQ.0)  GO  TO  30 

.0R.F4.EQ.1)  GO  TO  50 


.EQ.OP)  GO  TO  17 


N 


. EQ.OP. OR. ACUM( I) . EQ.CP. OR. ACUM ( I ) . EQ. COMMA) 
.NE.HYP)  GO  TO  50 


C 

c 

c 


10 


12 

C 

C 

C~ 

15 


SUBROU 
IMPLIC 
DIMENS 
DIMENS 
DIMENS 
COMMON 
COMMON 
COMMON 
EQUIVA 
DATA  L 
■THIS  S 
-THE  EN 
-TO  EAC 
DO  10 
NODTAB 
1  =  0 

AVAIL1 
RR=DOW 
L=l 

LEVEL( 
DD  =  RR 
EACH  L 
•STORED 
THE  ST 
IF(DOW 
L=L+1 
LEVEL( 
DD=DOW 
GO  TO 


TINE  TRAV 

IT  INTEGERS  (A-Z) 

ION  TOP (603) tRIGHT(602) T DOWN (601) 

ION  LEVEL( 50) fN0DTAB(50) 

ION  SERCH(30) tMINDEX( 500 , 8 ) , SEQ( 4000 ) , ACUM (241) 

/ONE/TOP, AVAIL  1 » P, Q, RR,DD» YY 

/THREE/ANS,SERCHTMINDEXt ACUM,SJ»  SK,N»SEQ 

/! GUR/NODTAB,FD 

L  NCE  (T0P(3),RIGHT(2) ,DOWN( 1) ) 


FORMAT  TREE  AND  LOCATES 
THE  CLASS  CORRESPONDING 
THE  ARRAY  'NODTAB*. 


X/'L'/, BLANK/'  «/ 

UBROUTINE  TRAVERSES  THE 

D  NODES  OF  EACH  BRANCH. 

H  END  NODE  IS  STORED  IN 

I=1T50 

(I )=BLANK 

=MINDEX(SJ,8) 
N(AVAILl) 

L)=AVAIL1 

EVEL  IN  THE  TREE  IS  ASSIGNED  A  NUMBER  WHICH  IS 

IN  A  STACK. AS  THE  TREE  IS  TRAVERSED 
ACK  IS  PUSHED  DOWN  OR  POPPED  UP  ACCORDINGLY. 
N(DD).EQ.O)  GO  TO  20 

L)=DD 
N(DD) 
15 
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20  1=1+1 

N0DTAB( I )=T0P(DD) 

MINDEX(TOP(DD) ,5)=LX 

IF(RIGHT(DD) .EQ.O)  GO  TO  25 

DD=RIGHT(DD) 

GO  TO  15 
25  IF(LEVEL(L) .EQ.RR)  GO  TO  30 

IF(LEVEL(L).EQ.AVAIL1)  GO  TO  35 

DD=LEVEL(L) 

IF(RIGHT(DD) .NE.O)  GO  TO  27 

L  =  L-1 

GO  TO  25 
27  DD=RIGHT(DD) 

L=L-1 

GO  TO  15 
30  IF(RIGHT(RR) .EQ.O)  GO  TO  35 

RR=RIGHT(RR) 

IF(DOWN(RR) .NE.O)  GO  TO  12 

1=1  +  1 

NODTABU  )=TOP(RR) 

MINDEX(TOP(RR) ,5)=LX 

GO  TO  30 
35  RETURN 

END 

SUBROUTINE  MENT 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  SERCH(30) ,MINDEX( 500,8) , SEQ(4000) ,ACUM(241) 

COMMON/THREE /ANS,SERCH,M I NDEX,ACUM,S J, SK,N,SEQ 

COMMON/FIVE/MULT,MK,EK 

DATA  OP/' ( '/,CP/' ) '/,STAR/'*'/ 

C «MK' SCANS  THE  DATA  ELEMENT  FROM  ITS  FIRST  'OP* 

C (COUNTING  'OP'S'UNTIL  THE  FIRST  CHARACTER  IS 

C ENCOUNTERED. THEN  'EK'SCANS  FROM  «MK«  (COUNTING  'CP'S' 

C UNTIL  AN  'OP'IS  ENCOUNTERED.  IF  THE  PARENTHESES  ARE 

C UNBALANCED  WHEN  'EK'STOPS  THEN  A  MULTIPLE  ENTRY  EXISTS 

PCTR=0 

MULT=0 
5  IF(ACUM(MK).EQ.OP)  GO  TO  10 

MK=MK+1 

IF(ACUM(MK).EQ.STAR)  GO  TO  30 

GO  TO  5 
10  PCTR=PCTR+1 

MK=MK+1 

IF(ACUM(MK).EO.STAR)  GO  TO  30 

IF( ACUM(MK) .EQ.OP)  GO  TO  10 
15  EK=MK 
20  EK=EK+1 

IF( ACUM(EK).EQ.CP)  PCTR=PCTR-1 

IF(ACUM(EK).EQ.STAR)  GO  TO  30 

IF(ACUM(EK). EQ.OP)  GO  TO  25 

GO  TO  20 
25  IF(PCTR.NE.O)  MULT=1 
30  RETURN 

END 

SUBROUTINE  QSCAN(*,*,*) 

IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  UPRAN(30) 

DIMENSION  SERCH(30) , MINDEX( 500,8), SEQ(4000) ,ACUM( 241) 

COMMON /THREE/ AN S,SERCH, MI NDEX, ACUM,SJ, SK,NTSEQ 

C0MM0N/SIX/S2, ATtAEtHCTRtUPRAN 

DATA  0P/« ( «/,CP/«) '/, COMMA/' ,'/, BLANK/'  ' / , HYP/ '- • / T ST 
1    COL/' : '/tAMP/'G'/ 

C THIS  SUBROUTINE  LOCATES  KEYWORDS  IN  THE  QUERY, 

C DETERMINES  IF  ALPHABETIC/NUMERIC  RANGES  ARE  REQUESTED, 

C LOADS  THE  ' SERCH • ARRAY, AND  RETURNS  TO  THE  M/PROG  FOR 

C QUERY  PROCESSING. 

51  =  0 

52  =  0 


90 


AT=N+1 
10  IF< ACUM(AT) .NE.OP)  GO  TO  15 

AT=AT+1 

GO  TO  10 
15  HCTR^O 

IF(ACUM(AT).EQ.HYP)  GO  TO  40 
20  AE=AT 

IF(S2. EQ.l)  GO  TO  22 

IF(Sl.EQ.O)  HCTR=HCTR+1 
22  DO  25  I=1T30 

SERCH(  I  )=BLANK 

UPRAN( I )=BLANK 
25  CONTINUE 

N=l 

30  SERCH(N)=ACUM( AE) 
AE=AE+1 

IF(ACUM(AE) .NE. COMMA. AND. ACUM(AE).NE.CP)  GO  TO  32 

51  =  0 

52  =  0 

GO  TO  50 
32  IF( ACUM( AE) .NE.COL)  GO  TO  34 

C ARRAY  «SERCH«  IS  LOADED  WITH  THE  LOWER  RANGE  LIMIT 

C AND  ARRAY  'UPRAN  IS  LOADED  WITH  THE  UPPER  RANGE  LIMIT 

Sl=l 

UI  =  1 

AE=AE+1 

31  UPRAN(UI)=ACUM(AE) 
AE=AE+1 

IF(ACUM(AE). EO. COMMA. OR. ACUM(AE) .EQ.CP)  GO  TO  60 

UI=UI+1 

GO  TO  31 

34  IF(ACUM(AE).EQ.AMP)  GO  TO  35 
N=N+1 

GO  TO  30 

35  S2=l 

GO  TO  50 
40  HCTR=HCTR+1 

ENTRY  QSCANK*,*,*) 
45  AT=AT+1 

IF(ACUM(AT).LQ.STAR)  GO  TO  70 

IF(ACUM( AT).EQ.OP.OR.ACUM( AT ) .EQ.COMMA . OR. ACUM( AT) .EQ, 
1    GO  TO  45 

IF(ACUM(AT).EQ.HYP)  GO  TO  40 

GO  TO  20 
50  RETURN  1 
60  RETURN  2 
70  RETURN  3 

END 
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